From: Liu Shixin liushixin2@huawei.com
stable inclusion from stable-v5.10.163 commit f9d19f3a044ca651b0be52a4bf951ffe74259b9f category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6CIF8 CVE: CVE-2023-0615
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 94a7ad9283464b75b12516c5512541d467cefcf8 ]
syzkaller found a bug:
BUG: unable to handle page fault for address: ffffc9000a3b1000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 100000067 P4D 100000067 PUD 10015f067 PMD 1121ca067 PTE 0 Oops: 0002 [#1] PREEMPT SMP CPU: 0 PID: 23489 Comm: vivid-000-vid-c Not tainted 6.1.0-rc1+ #512 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 RIP: 0010:memcpy_erms+0x6/0x10 [...] Call Trace: <TASK> ? tpg_fill_plane_buffer+0x856/0x15b0 vivid_fillbuff+0x8ac/0x1110 vivid_thread_vid_cap_tick+0x361/0xc90 vivid_thread_vid_cap+0x21a/0x3a0 kthread+0x143/0x180 ret_from_fork+0x1f/0x30 </TASK>
This is because we forget to check boundary after adjust compose->height int V4L2_SEL_TGT_CROP case. Add v4l2_rect_map_inside() to fix this problem for this case.
Fixes: ef834f7836ec ("[media] vivid: add the video capture and output parts") Signed-off-by: Liu Shixin liushixin2@huawei.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Wang Hai wanghai38@huawei.com Signed-off-by: Longlong Xia xialonglong1@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/media/test-drivers/vivid/vivid-vid-cap.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/media/test-drivers/vivid/vivid-vid-cap.c b/drivers/media/test-drivers/vivid/vivid-vid-cap.c index eadf28ab1e39..eeb0aeb62f79 100644 --- a/drivers/media/test-drivers/vivid/vivid-vid-cap.c +++ b/drivers/media/test-drivers/vivid/vivid-vid-cap.c @@ -953,6 +953,7 @@ int vivid_vid_cap_s_selection(struct file *file, void *fh, struct v4l2_selection if (dev->has_compose_cap) { v4l2_rect_set_min_size(compose, &min_rect); v4l2_rect_set_max_size(compose, &max_rect); + v4l2_rect_map_inside(compose, &fmt); } dev->fmt_cap_rect = fmt; tpg_s_buf_height(&dev->tpg, fmt.height);
From: Nikolay Aleksandrov nikolay@nvidia.com
mainline inclusion from mainline-v5.16-rc8 commit f83a112bd91a494cdee671aec74e777470fb4a07 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6DKZJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
As reported[1] if startup query interval is set too low in combination with large number of startup queries and we have multiple bridges or even a single bridge with multiple querier vlans configured we can crash the machine. Add a 1 second minimum which must be enforced by overwriting the value if set lower (i.e. without returning an error) to avoid breaking user-space. If that happens a log message is emitted to let the admin know that the startup interval has been set to the minimum. It doesn't make sense to make the startup interval lower than the normal query interval so use the same value of 1 second. The issue has been present since these intervals could be user-controlled.
[1] https://lore.kernel.org/netdev/e8b9ce41-57b9-b6e2-a46a-ff9c791cf0ba@gmail.co...
Fixes: d902eee43f19 ("bridge: Add multicast count/interval sysfs entries") Reported-by: Eric Dumazet eric.dumazet@gmail.com Signed-off-by: Nikolay Aleksandrov nikolay@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org
Conflicts: net/bridge/br_multicast.c net/bridge/br_netlink.c net/bridge/br_private.h net/bridge/br_sysfs_br.c
Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/bridge/br_multicast.c | 16 ++++++++++++++++ net/bridge/br_netlink.c | 2 +- net/bridge/br_private.h | 4 ++++ net/bridge/br_sysfs_br.c | 2 +- 4 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index e5328a2777ec..8ea85a4718f9 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -3609,6 +3609,22 @@ int br_multicast_set_mld_version(struct net_bridge *br, unsigned long val) } #endif
+void br_multicast_set_startup_query_intvl(struct net_bridge *br, + unsigned long val) +{ + unsigned long intvl_jiffies = clock_t_to_jiffies(val); + + if (intvl_jiffies < BR_MULTICAST_STARTUP_QUERY_INTVL_MIN) { + br_info(br, + "trying to set multicast startup query interval below minimum, setting to %lu (%ums)\n", + jiffies_to_clock_t(BR_MULTICAST_STARTUP_QUERY_INTVL_MIN), + jiffies_to_msecs(BR_MULTICAST_STARTUP_QUERY_INTVL_MIN)); + intvl_jiffies = BR_MULTICAST_STARTUP_QUERY_INTVL_MIN; + } + + br->multicast_startup_query_interval = intvl_jiffies; +} + /** * br_multicast_list_adjacent - Returns snooped multicast addresses * @dev: The bridge port adjacent to which to retrieve addresses diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 31b00ba5dcc8..1f85c1b77c89 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -1298,7 +1298,7 @@ static int br_changelink(struct net_device *brdev, struct nlattr *tb[], if (data[IFLA_BR_MCAST_STARTUP_QUERY_INTVL]) { u64 val = nla_get_u64(data[IFLA_BR_MCAST_STARTUP_QUERY_INTVL]);
- br->multicast_startup_query_interval = clock_t_to_jiffies(val); + br_multicast_set_startup_query_intvl(br, val); }
if (data[IFLA_BR_MCAST_STATS_ENABLED]) { diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index 2b88b17cc8b2..e439d9aae45a 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -28,6 +28,8 @@ #define BR_MAX_PORTS (1<<BR_PORT_BITS)
#define BR_MULTICAST_DEFAULT_HASH_MAX 4096 +#define BR_MULTICAST_QUERY_INTVL_MIN msecs_to_jiffies(1000) +#define BR_MULTICAST_STARTUP_QUERY_INTVL_MIN BR_MULTICAST_QUERY_INTVL_MIN
#define BR_VERSION "2.3"
@@ -1568,4 +1570,6 @@ void br_do_proxy_suppress_arp(struct sk_buff *skb, struct net_bridge *br, void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br, u16 vid, struct net_bridge_port *p, struct nd_msg *msg); struct nd_msg *br_is_nd_neigh_msg(struct sk_buff *skb, struct nd_msg *m); +void br_multicast_set_startup_query_intvl(struct net_bridge *br, + unsigned long val); #endif diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c index 7db06e3f642a..1fdc491f648a 100644 --- a/net/bridge/br_sysfs_br.c +++ b/net/bridge/br_sysfs_br.c @@ -640,7 +640,7 @@ static ssize_t multicast_startup_query_interval_show(
static int set_startup_query_interval(struct net_bridge *br, unsigned long val) { - br->multicast_startup_query_interval = clock_t_to_jiffies(val); + br_multicast_set_startup_query_intvl(br, val); return 0; }
From: Nikolay Aleksandrov nikolay@nvidia.com
mainline inclusion from mainline-v5.16-rc8 commit 99b40610956a8a8755653a67392e2a8b772453be category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6DKZJ CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
As reported[1] if query interval is set too low and we have multiple bridges or even a single bridge with multiple querier vlans configured we can crash the machine. Add a 1 second minimum which must be enforced by overwriting the value if set lower (i.e. without returning an error) to avoid breaking user-space. If that happens a log message is emitted to let the administrator know that the interval has been set to the minimum. The issue has been present since these intervals could be user-controlled.
[1] https://lore.kernel.org/netdev/e8b9ce41-57b9-b6e2-a46a-ff9c791cf0ba@gmail.co...
Fixes: d902eee43f19 ("bridge: Add multicast count/interval sysfs entries") Reported-by: Eric Dumazet eric.dumazet@gmail.com Signed-off-by: Nikolay Aleksandrov nikolay@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org
Conflicts: net/bridge/br_multicast.c net/bridge/br_netlink.c net/bridge/br_private.h net/bridge/br_sysfs_br.c
Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/bridge/br_multicast.c | 16 ++++++++++++++++ net/bridge/br_netlink.c | 2 +- net/bridge/br_private.h | 2 ++ net/bridge/br_sysfs_br.c | 2 +- 4 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c index 8ea85a4718f9..de79fbfe1611 100644 --- a/net/bridge/br_multicast.c +++ b/net/bridge/br_multicast.c @@ -3625,6 +3625,22 @@ void br_multicast_set_startup_query_intvl(struct net_bridge *br, br->multicast_startup_query_interval = intvl_jiffies; }
+void br_multicast_set_query_intvl(struct net_bridge *br, + unsigned long val) +{ + unsigned long intvl_jiffies = clock_t_to_jiffies(val); + + if (intvl_jiffies < BR_MULTICAST_QUERY_INTVL_MIN) { + br_info(br, + "trying to set multicast query interval below minimum, setting to %lu (%ums)\n", + jiffies_to_clock_t(BR_MULTICAST_QUERY_INTVL_MIN), + jiffies_to_msecs(BR_MULTICAST_QUERY_INTVL_MIN)); + intvl_jiffies = BR_MULTICAST_QUERY_INTVL_MIN; + } + + br->multicast_query_interval = intvl_jiffies; +} + /** * br_multicast_list_adjacent - Returns snooped multicast addresses * @dev: The bridge port adjacent to which to retrieve addresses diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index 1f85c1b77c89..bed6c798fea9 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -1286,7 +1286,7 @@ static int br_changelink(struct net_device *brdev, struct nlattr *tb[], if (data[IFLA_BR_MCAST_QUERY_INTVL]) { u64 val = nla_get_u64(data[IFLA_BR_MCAST_QUERY_INTVL]);
- br->multicast_query_interval = clock_t_to_jiffies(val); + br_multicast_set_query_intvl(br, val); }
if (data[IFLA_BR_MCAST_QUERY_RESPONSE_INTVL]) { diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h index e439d9aae45a..bfd0e995b078 100644 --- a/net/bridge/br_private.h +++ b/net/bridge/br_private.h @@ -1572,4 +1572,6 @@ void br_do_suppress_nd(struct sk_buff *skb, struct net_bridge *br, struct nd_msg *br_is_nd_neigh_msg(struct sk_buff *skb, struct nd_msg *m); void br_multicast_set_startup_query_intvl(struct net_bridge *br, unsigned long val); +void br_multicast_set_query_intvl(struct net_bridge *br, + unsigned long val); #endif diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c index 1fdc491f648a..c8b477cbc4a3 100644 --- a/net/bridge/br_sysfs_br.c +++ b/net/bridge/br_sysfs_br.c @@ -594,7 +594,7 @@ static ssize_t multicast_query_interval_show(struct device *d,
static int set_query_interval(struct net_bridge *br, unsigned long val) { - br->multicast_query_interval = clock_t_to_jiffies(val); + br_multicast_set_query_intvl(br, val); return 0; }
From: ZhaoLong Wang wangzhaolong1@huawei.com
mainline inclusion from mainline-v6.2-rc8 commit aa5465aeca3c66fecdf7efcf554aed79b4c4b211 category: bugfix bugzilla: 188381, https://gitee.com/openeuler/kernel/issues/I644ST CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
------------------------------------------------------
When the network status is unstable, use-after-free may occur when read data from the server.
BUG: KASAN: use-after-free in readpages_fill_pages+0x14c/0x7e0
Call Trace: <TASK> dump_stack_lvl+0x38/0x4c print_report+0x16f/0x4a6 kasan_report+0xb7/0x130 readpages_fill_pages+0x14c/0x7e0 cifs_readv_receive+0x46d/0xa40 cifs_demultiplex_thread+0x121c/0x1490 kthread+0x16b/0x1a0 ret_from_fork+0x2c/0x50 </TASK>
Allocated by task 2535: kasan_save_stack+0x22/0x50 kasan_set_track+0x25/0x30 __kasan_kmalloc+0x82/0x90 cifs_readdata_direct_alloc+0x2c/0x110 cifs_readdata_alloc+0x2d/0x60 cifs_readahead+0x393/0xfe0 read_pages+0x12f/0x470 page_cache_ra_unbounded+0x1b1/0x240 filemap_get_pages+0x1c8/0x9a0 filemap_read+0x1c0/0x540 cifs_strict_readv+0x21b/0x240 vfs_read+0x395/0x4b0 ksys_read+0xb8/0x150 do_syscall_64+0x3f/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc
Freed by task 79: kasan_save_stack+0x22/0x50 kasan_set_track+0x25/0x30 kasan_save_free_info+0x2e/0x50 __kasan_slab_free+0x10e/0x1a0 __kmem_cache_free+0x7a/0x1a0 cifs_readdata_release+0x49/0x60 process_one_work+0x46c/0x760 worker_thread+0x2a4/0x6f0 kthread+0x16b/0x1a0 ret_from_fork+0x2c/0x50
Last potentially related work creation: kasan_save_stack+0x22/0x50 __kasan_record_aux_stack+0x95/0xb0 insert_work+0x2b/0x130 __queue_work+0x1fe/0x660 queue_work_on+0x4b/0x60 smb2_readv_callback+0x396/0x800 cifs_abort_connection+0x474/0x6a0 cifs_reconnect+0x5cb/0xa50 cifs_readv_from_socket.cold+0x22/0x6c cifs_read_page_from_socket+0xc1/0x100 readpages_fill_pages.cold+0x2f/0x46 cifs_readv_receive+0x46d/0xa40 cifs_demultiplex_thread+0x121c/0x1490 kthread+0x16b/0x1a0 ret_from_fork+0x2c/0x50
The following function calls will cause UAF of the rdata pointer.
readpages_fill_pages cifs_read_page_from_socket cifs_readv_from_socket cifs_reconnect __cifs_reconnect cifs_abort_connection mid->callback() --> smb2_readv_callback queue_work(&rdata->work) # if the worker completes first, # the rdata is freed cifs_readv_complete kref_put cifs_readdata_release kfree(rdata) return rdata->... # UAF in readpages_fill_pages()
Similarly, this problem also occurs in the uncache_fill_pages().
Fix this by adjusts the order of condition judgment in the return statement.
Signed-off-by: ZhaoLong Wang wangzhaolong1@huawei.com Cc: stable@vger.kernel.org Acked-by: Paulo Alcantara (SUSE) pc@cjr.nz Signed-off-by: Steve French stfrench@microsoft.com Reviewed-by: Zhang Xiaoxu zhangxiaoxu5@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/cifs/file.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 6c06870f9018..6a86a6533dde 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -3533,7 +3533,7 @@ uncached_fill_pages(struct TCP_Server_Info *server, rdata->got_bytes += result; }
- return rdata->got_bytes > 0 && result != -ECONNABORTED ? + return result != -ECONNABORTED && rdata->got_bytes > 0 ? rdata->got_bytes : result; }
@@ -4287,7 +4287,7 @@ readpages_fill_pages(struct TCP_Server_Info *server, rdata->got_bytes += result; }
- return rdata->got_bytes > 0 && result != -ECONNABORTED ? + return result != -ECONNABORTED && rdata->got_bytes > 0 ? rdata->got_bytes : result; }
From: Stanislav Fomichev sdf@google.com
stable inclusion from stable-v5.10.163 commit 6d935a02658be82585ecb39aab339faa84496650 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6AVM6
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 07ec7b502800ba9f7b8b15cb01dd6556bb41aaca ]
syzkaller managed to trigger another case where skb->len == 0 when we enter __dev_queue_xmit:
WARNING: CPU: 0 PID: 2470 at include/linux/skbuff.h:2576 skb_assert_len include/linux/skbuff.h:2576 [inline] WARNING: CPU: 0 PID: 2470 at include/linux/skbuff.h:2576 __dev_queue_xmit+0x2069/0x35e0 net/core/dev.c:4295
Call Trace: dev_queue_xmit+0x17/0x20 net/core/dev.c:4406 __bpf_tx_skb net/core/filter.c:2115 [inline] __bpf_redirect_no_mac net/core/filter.c:2140 [inline] __bpf_redirect+0x5fb/0xda0 net/core/filter.c:2163 ____bpf_clone_redirect net/core/filter.c:2447 [inline] bpf_clone_redirect+0x247/0x390 net/core/filter.c:2419 bpf_prog_48159a89cb4a9a16+0x59/0x5e bpf_dispatcher_nop_func include/linux/bpf.h:897 [inline] __bpf_prog_run include/linux/filter.h:596 [inline] bpf_prog_run include/linux/filter.h:603 [inline] bpf_test_run+0x46c/0x890 net/bpf/test_run.c:402 bpf_prog_test_run_skb+0xbdc/0x14c0 net/bpf/test_run.c:1170 bpf_prog_test_run+0x345/0x3c0 kernel/bpf/syscall.c:3648 __sys_bpf+0x43a/0x6c0 kernel/bpf/syscall.c:5005 __do_sys_bpf kernel/bpf/syscall.c:5091 [inline] __se_sys_bpf kernel/bpf/syscall.c:5089 [inline] __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5089 do_syscall_64+0x54/0x70 arch/x86/entry/common.c:48 entry_SYSCALL_64_after_hwframe+0x61/0xc6
The reproducer doesn't really reproduce outside of syzkaller environment, so I'm taking a guess here. It looks like we do generate correct ETH_HLEN-sized packet, but we redirect the packet to the tunneling device. Before we do so, we __skb_pull l2 header and arrive again at skb->len == 0. Doesn't seem like we can do anything better than having an explicit check after __skb_pull?
Cc: Eric Dumazet edumazet@google.com Reported-by: syzbot+f635e86ec3fa0a37e019@syzkaller.appspotmail.com Signed-off-by: Stanislav Fomichev sdf@google.com Link: https://lore.kernel.org/r/20221027225537.353077-1-sdf@google.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Liu Jian liujian56@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/core/filter.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/core/filter.c b/net/core/filter.c index 8102f8ddc974..8c7fa0025341 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2127,6 +2127,10 @@ static int __bpf_redirect_no_mac(struct sk_buff *skb, struct net_device *dev,
if (mlen) { __skb_pull(skb, mlen); + if (unlikely(!skb->len)) { + kfree_skb(skb); + return -ERANGE; + }
/* At ingress, the mac header has already been pulled once. * At egress, skb_pospull_rcsum has to be done in case that
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v5.10.163 commit be719496ae6a7fc325e9e5056a52f63ebc84cc0c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6AVM6
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 0a182f8d607464911756b4dbef5d6cad8de22469 ]
sock_map_free() calls release_sock(sk) without owning a reference on the socket. This can cause use-after-free as syzbot found [1]
Jakub Sitnicki already took care of a similar issue in sock_hash_free() in commit 75e68e5bf2c7 ("bpf, sockhash: Synchronize delete from bucket list on map free")
[1] refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 0 PID: 3785 at lib/refcount.c:31 refcount_warn_saturate+0x17c/0x1a0 lib/refcount.c:31 Modules linked in: CPU: 0 PID: 3785 Comm: kworker/u4:6 Not tainted 6.1.0-rc7-syzkaller-00103-gef4d3ea40565 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 Workqueue: events_unbound bpf_map_free_deferred RIP: 0010:refcount_warn_saturate+0x17c/0x1a0 lib/refcount.c:31 Code: 68 8b 31 c0 e8 75 71 15 fd 0f 0b e9 64 ff ff ff e8 d9 6e 4e fd c6 05 62 9c 3d 0a 01 48 c7 c7 80 bb 68 8b 31 c0 e8 54 71 15 fd <0f> 0b e9 43 ff ff ff 89 d9 80 e1 07 80 c1 03 38 c1 0f 8c a2 fe ff RSP: 0018:ffffc9000456fb60 EFLAGS: 00010246 RAX: eae59bab72dcd700 RBX: 0000000000000004 RCX: ffff8880207057c0 RDX: 0000000000000000 RSI: 0000000000000201 RDI: 0000000000000000 RBP: 0000000000000004 R08: ffffffff816fdabd R09: fffff520008adee5 R10: fffff520008adee5 R11: 1ffff920008adee4 R12: 0000000000000004 R13: dffffc0000000000 R14: ffff88807b1c6c00 R15: 1ffff1100f638dcf FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b30c30000 CR3: 000000000d08e000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] __sock_put include/net/sock.h:779 [inline] tcp_release_cb+0x2d0/0x360 net/ipv4/tcp_output.c:1092 release_sock+0xaf/0x1c0 net/core/sock.c:3468 sock_map_free+0x219/0x2c0 net/core/sock_map.c:356 process_one_work+0x81c/0xd10 kernel/workqueue.c:2289 worker_thread+0xb14/0x1330 kernel/workqueue.c:2436 kthread+0x266/0x300 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306 </TASK>
Fixes: 7e81a3530206 ("bpf: Sockmap, ensure sock lock held during tear down") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Cc: Jakub Sitnicki jakub@cloudflare.com Cc: John Fastabend john.fastabend@gmail.com Cc: Alexei Starovoitov ast@kernel.org Cc: Daniel Borkmann daniel@iogearbox.net Cc: Song Liu songliubraving@fb.com Acked-by: John Fastabend john.fastabend@gmail.com Link: https://lore.kernel.org/r/20221202111640.2745533-1-edumazet@google.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Liu Jian liujian56@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/core/sock_map.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/core/sock_map.c b/net/core/sock_map.c index 98b5a7bb2226..f8c287788bea 100644 --- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -358,11 +358,13 @@ static void sock_map_free(struct bpf_map *map)
sk = xchg(psk, NULL); if (sk) { + sock_hold(sk); lock_sock(sk); rcu_read_lock(); sock_map_unref(sk, psk); rcu_read_unlock(); release_sock(sk); + sock_put(sk); } }
From: Cui GaoSheng cuigaosheng1@huawei.com
hulk inclusion category: bugfix bugzilla: 188368 https://gitee.com/openeuler/kernel/issues/I6EEK7 CVE: NA
--------------------------------
Avoid using conflicting compilation parameters during compilation, so we move -fpic from KBUILD_CFLAGS to CFLAGS_KERNEL to avoid using -fpic and fno-pic parameters together.
Fixes: 6bc05b0a27f0 ("arm32: kaslr: Fix the bug of module install failure") Signed-off-by: Cui GaoSheng cuigaosheng1@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- arch/arm/Makefile | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm/Makefile b/arch/arm/Makefile index 8300530757ae..d6817de42f24 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -49,7 +49,8 @@ KBUILD_LDFLAGS += -EL endif
ifeq ($(CONFIG_RELOCATABLE),y) -KBUILD_CFLAGS += -fpic -include $(srctree)/include/linux/hidden.h +KBUILD_CFLAGS += -include $(srctree)/include/linux/hidden.h +CFLAGS_KERNEL += -fpic CFLAGS_MODULE += -fno-pic LDFLAGS_vmlinux += -pie -shared -Bsymbolic endif
From: Stanislav Fomichev sdf@google.com
stable inclusion from stable-v5.10.163 commit 148dcbd3af039ae39c3af697a3183008c7995805 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6F7AI
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 9f225444467b98579cf28d94f4ad053460dfdb84 ]
Syzkaller triggered flow dissector warning with the following:
r0 = openat$ppp(0xffffffffffffff9c, &(0x7f0000000000), 0xc0802, 0x0) ioctl$PPPIOCNEWUNIT(r0, 0xc004743e, &(0x7f00000000c0)) ioctl$PPPIOCSACTIVE(r0, 0x40107446, &(0x7f0000000240)={0x2, &(0x7f0000000180)=[{0x20, 0x0, 0x0, 0xfffff034}, {0x6}]}) pwritev(r0, &(0x7f0000000040)=[{&(0x7f0000000140)='\x00!', 0x2}], 0x1, 0x0, 0x0)
[ 9.485814] WARNING: CPU: 3 PID: 329 at net/core/flow_dissector.c:1016 __skb_flow_dissect+0x1ee0/0x1fa0 [ 9.485929] skb_get_poff+0x53/0xa0 [ 9.485937] bpf_skb_get_pay_offset+0xe/0x20 [ 9.485944] ? ppp_send_frame+0xc2/0x5b0 [ 9.485949] ? _raw_spin_unlock_irqrestore+0x40/0x60 [ 9.485958] ? __ppp_xmit_process+0x7a/0xe0 [ 9.485968] ? ppp_xmit_process+0x5b/0xb0 [ 9.485974] ? ppp_write+0x12a/0x190 [ 9.485981] ? do_iter_write+0x18e/0x2d0 [ 9.485987] ? __import_iovec+0x30/0x130 [ 9.485997] ? do_pwritev+0x1b6/0x240 [ 9.486016] ? trace_hardirqs_on+0x47/0x50 [ 9.486023] ? __x64_sys_pwritev+0x24/0x30 [ 9.486026] ? do_syscall_64+0x3d/0x80 [ 9.486031] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
Flow dissector tries to find skb net namespace either via device or via socket. Neigher is set in ppp_send_frame, so let's manually use ppp->dev.
Cc: Paul Mackerras paulus@samba.org Cc: linux-ppp@vger.kernel.org Reported-by: syzbot+41cab52ab62ee99ed24a@syzkaller.appspotmail.com Signed-off-by: Stanislav Fomichev sdf@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Baisong Zhong zhongbaisong@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/net/ppp/ppp_generic.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c index 2b9815ec4a62..b825c6a9b6dd 100644 --- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -1610,6 +1610,8 @@ ppp_send_frame(struct ppp *ppp, struct sk_buff *skb) int len; unsigned char *cp;
+ skb->dev = ppp->dev; + if (proto < 0x8000) { #ifdef CONFIG_PPP_FILTER /* check if we should pass this packet */
From: Rik van Riel riel@surriel.com
mainline inclusion from mainline-v6.1-rc2 commit 12df140f0bdfae5dcfc81800970dd7f6f632e00c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6EVPO
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
The h->*_huge_pages counters are protected by the hugetlb_lock, but alloc_huge_page has a corner case where it can decrement the counter outside of the lock.
This could lead to a corrupted value of h->resv_huge_pages, which we have observed on our systems.
Take the hugetlb_lock before decrementing h->resv_huge_pages to avoid a potential race.
Link: https://lkml.kernel.org/r/20221017202505.0e6a4fcd@imladris.surriel.com Fixes: a88c76954804 ("mm: hugetlb: fix hugepage memory leak caused by wrong reserve count") Signed-off-by: Rik van Riel riel@surriel.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Cc: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Cc: Glen McCready gkmccready@meta.com Cc: Mike Kravetz mike.kravetz@oracle.com Cc: Muchun Song songmuchun@bytedance.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Zhang Peng zhangpeng362@huawei.com Reviewed-by: tong tiangen tongtiangen@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- mm/hugetlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5b0d2264b99b..72e16b758471 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2572,11 +2572,11 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, page = alloc_buddy_huge_page_with_mpol(h, vma, addr); if (!page) goto out_uncharge_cgroup; + spin_lock_irq(&hugetlb_lock); if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { SetHPageRestoreReserve(page); h->resv_huge_pages--; } - spin_lock_irq(&hugetlb_lock); list_add(&page->lru, &h->hugepage_activelist); /* Fall through */ }
From: Kefeng Wang wangkefeng.wang@huawei.com
mainline inclusion from mainline-v6.2-rc7 commit ac86f547ca1002aec2ef66b9e64d03f45bbbfbb9 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6BYND
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
As commit 18365225f044 ("hwpoison, memcg: forcibly uncharge LRU pages"), hwpoison will forcibly uncharg a LRU hwpoisoned page, the folio_memcg could be NULl, then, mem_cgroup_track_foreign_dirty_slowpath() could occurs a NULL pointer dereference, let's do not record the foreign writebacks for folio memcg is null in mem_cgroup_track_foreign_dirty() to fix it.
Link: https://lkml.kernel.org/r/20230129040945.180629-1-wangkefeng.wang@huawei.com Fixes: 97b27821b485 ("writeback, memcg: Implement foreign dirty flushing") Signed-off-by: Kefeng Wang wangkefeng.wang@huawei.com Reported-by: Ma Wupeng mawupeng1@huawei.com Tested-by: Miko Larsson mikoxyzzz@gmail.com Acked-by: Michal Hocko mhocko@suse.com Cc: Jan Kara jack@suse.cz Cc: Jens Axboe axboe@kernel.dk Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Ma Wupeng mawupeng1@huawei.com Cc: Naoya Horiguchi naoya.horiguchi@nec.com Cc: Shakeel Butt shakeelb@google.com Cc: Tejun Heo tj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Conflicts: include/linux/memcontrol.h Signed-off-by: Lu Jialin lujialin4@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Reviewed-by: guozihua 00570089 guozihua@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- include/linux/memcontrol.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 6b681ee988b3..7eb72defec9e 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1860,10 +1860,13 @@ void mem_cgroup_track_foreign_dirty_slowpath(struct page *page, static inline void mem_cgroup_track_foreign_dirty(struct page *page, struct bdi_writeback *wb) { + struct mem_cgroup *memcg; + if (mem_cgroup_disabled()) return;
- if (unlikely(&page_memcg(page)->css != wb->memcg_css)) + memcg = page_memcg(page); + if (unlikely(memcg && &memcg->css != wb->memcg_css)) mem_cgroup_track_foreign_dirty_slowpath(page, wb); }
From: Huajingjing huajingjing1@huawei.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I67J42 CVE: NA
-------------------------------------------------
The BMA software is a system management software offered by Huawei. It supports the status monitoring, performance monitoring and event monitoring of various components, including server CPUs, memory hard disks, NICs, IB cards, PCIe cards, RAID controller cards and optical modules.
In this version, the system resets due to the BMA spin lock when the memory usage is too high, the vulnerability has been rectified.
Signed-off-by: Huajingjing huajingjing1@huawei.com Reviewed-by: ChenJiesong chenjiesong@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- .../ethernet/huawei/bma/cdev_drv/bma_cdev.c | 2 +- .../huawei/bma/edma_drv/bma_devintf.c | 110 +++++++++++------- .../ethernet/huawei/bma/edma_drv/bma_pci.h | 2 +- .../huawei/bma/kbox_drv/kbox_include.h | 2 +- .../ethernet/huawei/bma/kbox_drv/kbox_panic.c | 4 +- .../huawei/bma/kbox_drv/kbox_printk.c | 4 +- .../huawei/bma/kbox_drv/kbox_ram_op.c | 2 +- .../ethernet/huawei/bma/veth_drv/veth_hb.c | 7 +- .../ethernet/huawei/bma/veth_drv/veth_hb.h | 2 +- 9 files changed, 81 insertions(+), 54 deletions(-)
diff --git a/drivers/net/ethernet/huawei/bma/cdev_drv/bma_cdev.c b/drivers/net/ethernet/huawei/bma/cdev_drv/bma_cdev.c index 0348a83005d6..9468a5a0c768 100644 --- a/drivers/net/ethernet/huawei/bma/cdev_drv/bma_cdev.c +++ b/drivers/net/ethernet/huawei/bma/cdev_drv/bma_cdev.c @@ -28,7 +28,7 @@ #ifdef DRV_VERSION #define CDEV_VERSION MICRO_TO_STR(DRV_VERSION) #else -#define CDEV_VERSION "0.3.4" +#define CDEV_VERSION "0.3.5" #endif
#define CDEV_DEFAULT_NUM 4 diff --git a/drivers/net/ethernet/huawei/bma/edma_drv/bma_devintf.c b/drivers/net/ethernet/huawei/bma/edma_drv/bma_devintf.c index acf6bbfc50ff..149c393e8b0d 100644 --- a/drivers/net/ethernet/huawei/bma/edma_drv/bma_devintf.c +++ b/drivers/net/ethernet/huawei/bma/edma_drv/bma_devintf.c @@ -419,8 +419,10 @@ EXPORT_SYMBOL(bma_intf_int_to_bmc);
int bma_intf_is_link_ok(void) { - return (g_bma_dev->edma_host.statistics.remote_status == - REGISTERED) ? 1 : 0; + if ((&g_bma_dev->edma_host != NULL) && + (g_bma_dev->edma_host.statistics.remote_status == REGISTERED)) + return 1; + return 0; } EXPORT_SYMBOL(bma_intf_is_link_ok);
@@ -460,14 +462,10 @@ int bma_cdev_recv_msg(void *handle, char __user *data, size_t count) } EXPORT_SYMBOL_GPL(bma_cdev_recv_msg);
-int bma_cdev_add_msg(void *handle, const char __user *msg, size_t msg_len) +static int check_cdev_add_msg_param(struct bma_priv_data_s *handle, +const char __user *msg, size_t msg_len) { struct bma_priv_data_s *priv = NULL; - struct edma_msg_hdr_s *hdr = NULL; - unsigned long flags = 0; - int total_len = 0; - int ret = 0; - struct edma_host_s *phost = &g_bma_dev->edma_host;
if (!handle || !msg || msg_len == 0) { BMA_LOG(DLOG_DEBUG, "input NULL point!\n"); @@ -479,54 +477,80 @@ int bma_cdev_add_msg(void *handle, const char __user *msg, size_t msg_len) return -EINVAL; }
- priv = (struct bma_priv_data_s *)handle; + priv = handle;
if (priv->user.type >= TYPE_MAX) { BMA_LOG(DLOG_DEBUG, "error type = %d\n", priv->user.type); return -EFAULT; } - total_len = SIZE_OF_MSG_HDR + msg_len; + + return 0; +} + +static void edma_msg_hdr_init(struct edma_msg_hdr_s *hdr, + struct bma_priv_data_s *private_data, + char *msg_buf, size_t msg_len) +{ + hdr->type = private_data->user.type; + hdr->sub_type = private_data->user.sub_type; + hdr->user_id = private_data->user.user_id; + hdr->datalen = msg_len; + BMA_LOG(DLOG_DEBUG, "msg_len is %zu\n", msg_len); + + memcpy(hdr->data, msg_buf, msg_len); +} + +int bma_cdev_add_msg(void *handle, const char __user *msg, size_t msg_len) +{ + struct bma_priv_data_s *priv = NULL; + struct edma_msg_hdr_s *hdr = NULL; + unsigned long flags = 0; + unsigned int total_len = 0; + int ret = 0; + struct edma_host_s *phost = &g_bma_dev->edma_host; + char *msg_buf = NULL; + + ret = check_cdev_add_msg_param(handle, msg, msg_len); + if (ret != 0) + return ret; + + priv = (struct bma_priv_data_s *)handle; + + total_len = (unsigned int)(SIZE_OF_MSG_HDR + msg_len); + if (phost->msg_send_write + total_len > HOST_MAX_SEND_MBX_LEN - SIZE_OF_MBX_HDR) { + BMA_LOG(DLOG_DEBUG, "msg lost,msg_send_write: %u,msg_len:%u,max_len: %d\n", + phost->msg_send_write, total_len, HOST_MAX_SEND_MBX_LEN); + return -ENOSPC; + } + + msg_buf = (char *)kmalloc(msg_len, GFP_KERNEL); + if (!msg_buf) { + BMA_LOG(DLOG_ERROR, "malloc msg_buf failed\n"); + return -ENOMEM; + } + + if (copy_from_user(msg_buf, msg, msg_len)) { + BMA_LOG(DLOG_ERROR, "copy_from_user error\n"); + kfree(msg_buf); + return -EFAULT; + }
spin_lock_irqsave(&phost->send_msg_lock, flags);
- if (phost->msg_send_write + total_len <= - HOST_MAX_SEND_MBX_LEN - SIZE_OF_MBX_HDR) { - hdr = (struct edma_msg_hdr_s *)(phost->msg_send_buf + - phost->msg_send_write); - hdr->type = priv->user.type; - hdr->sub_type = priv->user.sub_type; - hdr->user_id = priv->user.user_id; - hdr->datalen = msg_len; - BMA_LOG(DLOG_DEBUG, "msg_len is %zu\n", msg_len); - - if (copy_from_user(hdr->data, msg, msg_len)) { - BMA_LOG(DLOG_ERROR, "copy_from_user error\n"); - ret = -EFAULT; - goto end; - } + hdr = (struct edma_msg_hdr_s *)(phost->msg_send_buf + phost->msg_send_write); + edma_msg_hdr_init(hdr, priv, msg_buf, msg_len);
- phost->msg_send_write += total_len; - phost->statistics.send_bytes += total_len; - phost->statistics.send_pkgs++; + phost->msg_send_write += total_len; + phost->statistics.send_bytes += total_len; + phost->statistics.send_pkgs++; #ifdef EDMA_TIMER - (void)mod_timer(&phost->timer, jiffies_64); + (void)mod_timer(&phost->timer, jiffies_64); #endif - BMA_LOG(DLOG_DEBUG, "msg_send_write = %d\n", - phost->msg_send_write); - - ret = msg_len; - goto end; - } else { - BMA_LOG(DLOG_DEBUG, - "msg lost,msg_send_write: %d,msg_len:%d,max_len: %d\n", - phost->msg_send_write, total_len, - HOST_MAX_SEND_MBX_LEN); - ret = -ENOSPC; - goto end; - } + BMA_LOG(DLOG_DEBUG, "msg_send_write = %d\n", phost->msg_send_write);
-end: + ret = msg_len; spin_unlock_irqrestore(&g_bma_dev->edma_host.send_msg_lock, flags); + kfree(msg_buf); return ret; } EXPORT_SYMBOL_GPL(bma_cdev_add_msg); diff --git a/drivers/net/ethernet/huawei/bma/edma_drv/bma_pci.h b/drivers/net/ethernet/huawei/bma/edma_drv/bma_pci.h index 92893b6bfd08..639ed4e58a8b 100644 --- a/drivers/net/ethernet/huawei/bma/edma_drv/bma_pci.h +++ b/drivers/net/ethernet/huawei/bma/edma_drv/bma_pci.h @@ -71,7 +71,7 @@ struct bma_pci_dev_s { #ifdef DRV_VERSION #define BMA_VERSION MICRO_TO_STR(DRV_VERSION) #else -#define BMA_VERSION "0.3.4" +#define BMA_VERSION "0.3.5" #endif
#ifdef CONFIG_ARM64 diff --git a/drivers/net/ethernet/huawei/bma/kbox_drv/kbox_include.h b/drivers/net/ethernet/huawei/bma/kbox_drv/kbox_include.h index ffadf3727734..b027306e52c1 100644 --- a/drivers/net/ethernet/huawei/bma/kbox_drv/kbox_include.h +++ b/drivers/net/ethernet/huawei/bma/kbox_drv/kbox_include.h @@ -23,7 +23,7 @@ #ifdef DRV_VERSION #define KBOX_VERSION MICRO_TO_STR(DRV_VERSION) #else -#define KBOX_VERSION "0.3.4" +#define KBOX_VERSION "0.3.5" #endif
#define UNUSED(x) (x = x) diff --git a/drivers/net/ethernet/huawei/bma/kbox_drv/kbox_panic.c b/drivers/net/ethernet/huawei/bma/kbox_drv/kbox_panic.c index 0c17cd2bae49..2b142ae9bff6 100644 --- a/drivers/net/ethernet/huawei/bma/kbox_drv/kbox_panic.c +++ b/drivers/net/ethernet/huawei/bma/kbox_drv/kbox_panic.c @@ -135,7 +135,7 @@ int kbox_panic_init(void) int ret = KBOX_TRUE;
g_panic_info_buf = kmalloc(SLOT_LENGTH, GFP_KERNEL); - if (IS_ERR(g_panic_info_buf) || !g_panic_info_buf) { + if (!g_panic_info_buf) { KBOX_MSG("kmalloc g_panic_info_buf fail!\n"); ret = -ENOMEM; goto fail; @@ -144,7 +144,7 @@ int kbox_panic_init(void) memset(g_panic_info_buf, 0, SLOT_LENGTH);
g_panic_info_buf_tmp = kmalloc(SLOT_LENGTH, GFP_KERNEL); - if (IS_ERR(g_panic_info_buf_tmp) || !g_panic_info_buf_tmp) { + if (!g_panic_info_buf_tmp) { KBOX_MSG("kmalloc g_panic_info_buf_tmp fail!\n"); ret = -ENOMEM; goto fail; diff --git a/drivers/net/ethernet/huawei/bma/kbox_drv/kbox_printk.c b/drivers/net/ethernet/huawei/bma/kbox_drv/kbox_printk.c index 3b04ba206108..630a1e16ea24 100644 --- a/drivers/net/ethernet/huawei/bma/kbox_drv/kbox_printk.c +++ b/drivers/net/ethernet/huawei/bma/kbox_drv/kbox_printk.c @@ -304,7 +304,7 @@ int kbox_printk_init(int kbox_proc_exist)
g_printk_info_buf = kmalloc(SECTION_PRINTK_LEN, GFP_KERNEL); - if (IS_ERR(g_printk_info_buf) || !g_printk_info_buf) { + if (!g_printk_info_buf) { KBOX_MSG("kmalloc g_printk_info_buf fail!\n"); ret = -ENOMEM; goto fail; @@ -314,7 +314,7 @@ int kbox_printk_init(int kbox_proc_exist)
g_printk_info_buf_tmp = kmalloc(SECTION_PRINTK_LEN, GFP_KERNEL); - if (IS_ERR(g_printk_info_buf_tmp) || !g_printk_info_buf_tmp) { + if (!g_printk_info_buf_tmp) { KBOX_MSG("kmalloc g_printk_info_buf_tmp fail!\n"); ret = -ENOMEM; goto fail; diff --git a/drivers/net/ethernet/huawei/bma/kbox_drv/kbox_ram_op.c b/drivers/net/ethernet/huawei/bma/kbox_drv/kbox_ram_op.c index 49690bab1cef..9f6dfe55e3fb 100644 --- a/drivers/net/ethernet/huawei/bma/kbox_drv/kbox_ram_op.c +++ b/drivers/net/ethernet/huawei/bma/kbox_drv/kbox_ram_op.c @@ -432,7 +432,7 @@ int kbox_write_op(long long offset, unsigned int count, return KBOX_FALSE;
temp_buf_char = kmalloc(TEMP_BUF_DATA_SIZE, GFP_KERNEL); - if (!temp_buf_char || IS_ERR(temp_buf_char)) { + if (!temp_buf_char) { KBOX_MSG("kmalloc temp_buf_char fail!\n"); up(&user_sem); return -ENOMEM; diff --git a/drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c b/drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c index 240db31d7178..7705bb919ead 100644 --- a/drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c +++ b/drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c @@ -638,6 +638,7 @@ s32 veth_refill_rxskb(struct bspveth_rxtx_q *prx_queue, int queue) next_to_fill = (next_to_fill + 1) & BSPVETH_POINT_MASK; }
+ mb();/* memory barriers. */ prx_queue->next_to_fill = next_to_fill;
tail = prx_queue->tail; @@ -672,6 +673,7 @@ s32 bspveth_setup_rx_skb(struct bspveth_device *pvethdev, if (!idx) /* Can't alloc even one packets */ return -EFAULT;
+ mb();/* memory barriers. */ prx_queue->next_to_fill = idx;
VETH_LOG(DLOG_DEBUG, "prx_queue->next_to_fill=%d\n", @@ -886,8 +888,6 @@ s32 bspveth_setup_all_rx_resources(struct bspveth_device *pvethdev) err = bspveth_setup_rx_resources(pvethdev, pvethdev->prx_queue[qid]); if (err) { - kfree(pvethdev->prx_queue[qid]); - pvethdev->prx_queue[qid] = NULL; VETH_LOG(DLOG_ERROR, "Allocation for Rx Queue %u failed\n", qid);
@@ -1328,6 +1328,7 @@ s32 veth_send_one_pkt(struct sk_buff *skb, int queue) pbd_v->off = off; pbd_v->len = skb->len;
+ mb();/* memory barriers. */ head = (head + 1) & BSPVETH_POINT_MASK; ptx_queue->head = head;
@@ -1424,6 +1425,7 @@ s32 veth_free_txskb(struct bspveth_rxtx_q *ptx_queue, int queue) next_to_free = (next_to_free + 1) & BSPVETH_POINT_MASK; }
+ mb(); /* memory barriers. */ ptx_queue->next_to_free = next_to_free; tail = ptx_queue->tail;
@@ -1522,6 +1524,7 @@ s32 veth_recv_pkt(struct bspveth_rxtx_q *prx_queue, int queue) } }
+ mb();/* memory barriers. */ prx_queue->tail = tail; head = prx_queue->head;
diff --git a/drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.h b/drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.h index 9a4d699e6421..503636549320 100644 --- a/drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.h +++ b/drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.h @@ -31,7 +31,7 @@ extern "C" { #ifdef DRV_VERSION #define VETH_VERSION MICRO_TO_STR(DRV_VERSION) #else -#define VETH_VERSION "0.3.4" +#define VETH_VERSION "0.3.5" #endif
#define MODULE_NAME "veth"
From: Peter Zijlstra peterz@infradead.org
mainline inclusion from mainline-v6.2-rc1 commit 97e3d26b5e5f371b3ee223d94dd123e6c442ba80 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6C6UC CVE: CVE-2023-0597
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Seth found that the CPU-entry-area; the piece of per-cpu data that is mapped into the userspace page-tables for kPTI is not subject to any randomization -- irrespective of kASLR settings.
On x86_64 a whole P4D (512 GB) of virtual address space is reserved for this structure, which is plenty large enough to randomize things a little.
As such, use a straight forward randomization scheme that avoids duplicates to spread the existing CPUs over the available space.
[ bp: Fix le build. ]
Reported-by: Seth Jenkins sethjenkins@google.com Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Borislav Petkov bp@suse.de Confilict: arch/x86/mm/cpu_entry_area.c
Use get_random_u32() instead of prandom_u32_max() in init_cea_offsets().
With CONFIG_RANDOMIZE_BASE=y, KASLR use prandom_seed_state() init prandom seed before init_cea_offsets(). But when CONFIG_RANDOMIZE_BASE=n, prandom seed init after init_cea_offsets() cause cea is always 0.
The patch d4150779e60f("random32: use real rng for non-deterministic randomness") use get_random_u32() instead of prandom_u32() in prandom_u32_max() that make prandom_u32_max() don't need to wait prandom seed init(). But the patch has many pre-patches that have not been merged,
So,we adopt the current solution as a workaround. directly use get_random_u32() in init_cea_offsets() to simplify code. Signed-off-by: Ke Liu liuke94@huawei.com Reviewed-by: Chen Wandun chenwandun@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- arch/x86/include/asm/cpu_entry_area.h | 4 --- arch/x86/include/asm/pgtable_areas.h | 8 ++++- arch/x86/kernel/hw_breakpoint.c | 2 +- arch/x86/mm/cpu_entry_area.c | 51 ++++++++++++++++++++++++--- 4 files changed, 55 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h index dd5ea1bdf04c..e2c04a5015b0 100644 --- a/arch/x86/include/asm/cpu_entry_area.h +++ b/arch/x86/include/asm/cpu_entry_area.h @@ -130,10 +130,6 @@ struct cpu_entry_area { };
#define CPU_ENTRY_AREA_SIZE (sizeof(struct cpu_entry_area)) -#define CPU_ENTRY_AREA_ARRAY_SIZE (CPU_ENTRY_AREA_SIZE * NR_CPUS) - -/* Total size includes the readonly IDT mapping page as well: */ -#define CPU_ENTRY_AREA_TOTAL_SIZE (CPU_ENTRY_AREA_ARRAY_SIZE + PAGE_SIZE)
DECLARE_PER_CPU(struct cpu_entry_area *, cpu_entry_area); DECLARE_PER_CPU(struct cea_exception_stacks *, cea_exception_stacks); diff --git a/arch/x86/include/asm/pgtable_areas.h b/arch/x86/include/asm/pgtable_areas.h index d34cce1b995c..4f056fb88174 100644 --- a/arch/x86/include/asm/pgtable_areas.h +++ b/arch/x86/include/asm/pgtable_areas.h @@ -11,6 +11,12 @@
#define CPU_ENTRY_AREA_RO_IDT_VADDR ((void *)CPU_ENTRY_AREA_RO_IDT)
-#define CPU_ENTRY_AREA_MAP_SIZE (CPU_ENTRY_AREA_PER_CPU + CPU_ENTRY_AREA_ARRAY_SIZE - CPU_ENTRY_AREA_BASE) +#ifdef CONFIG_X86_32 +#define CPU_ENTRY_AREA_MAP_SIZE (CPU_ENTRY_AREA_PER_CPU + \ + (CPU_ENTRY_AREA_SIZE * NR_CPUS) - \ + CPU_ENTRY_AREA_BASE) +#else +#define CPU_ENTRY_AREA_MAP_SIZE P4D_SIZE +#endif
#endif /* _ASM_X86_PGTABLE_AREAS_H */ diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c index 668a4a6533d9..bbb0f737aab1 100644 --- a/arch/x86/kernel/hw_breakpoint.c +++ b/arch/x86/kernel/hw_breakpoint.c @@ -266,7 +266,7 @@ static inline bool within_cpu_entry(unsigned long addr, unsigned long end)
/* CPU entry erea is always used for CPU entry */ if (within_area(addr, end, CPU_ENTRY_AREA_BASE, - CPU_ENTRY_AREA_TOTAL_SIZE)) + CPU_ENTRY_AREA_MAP_SIZE)) return true;
/* diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c index 6c2f1b76a0b6..98396a67958c 100644 --- a/arch/x86/mm/cpu_entry_area.c +++ b/arch/x86/mm/cpu_entry_area.c @@ -5,6 +5,7 @@ #include <linux/kallsyms.h> #include <linux/kcore.h> #include <linux/pgtable.h> +#include <linux/random.h>
#include <asm/cpu_entry_area.h> #include <asm/fixmap.h> @@ -15,16 +16,57 @@ static DEFINE_PER_CPU_PAGE_ALIGNED(struct entry_stack_page, entry_stack_storage) #ifdef CONFIG_X86_64 static DEFINE_PER_CPU_PAGE_ALIGNED(struct exception_stacks, exception_stacks); DEFINE_PER_CPU(struct cea_exception_stacks*, cea_exception_stacks); -#endif
-#ifdef CONFIG_X86_32 +static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, _cea_offset); + +static __always_inline unsigned int cea_offset(unsigned int cpu) +{ + return per_cpu(_cea_offset, cpu); +} + +static __init void init_cea_offsets(void) +{ + unsigned int max_cea; + unsigned int i, j; + + max_cea = (CPU_ENTRY_AREA_MAP_SIZE - PAGE_SIZE) / CPU_ENTRY_AREA_SIZE; + + /* O(sodding terrible) */ + for_each_possible_cpu(i) { + unsigned int cea; + +again: + /* + * Directly use get_random_u32() instead of prandom_u32_max + * to avoid seed can't be generated when CONFIG_RANDOMIZE_BASE=n. + */ + cea = (u32)(((u64) get_random_u32() * max_cea) >> 32); + + for_each_possible_cpu(j) { + if (cea_offset(j) == cea) + goto again; + + if (i == j) + break; + } + + per_cpu(_cea_offset, i) = cea; + } +} +#else /* !X86_64 */ DECLARE_PER_CPU_PAGE_ALIGNED(struct doublefault_stack, doublefault_stack); + +static __always_inline unsigned int cea_offset(unsigned int cpu) +{ + return cpu; +} +static inline void init_cea_offsets(void) { } #endif
/* Is called from entry code, so must be noinstr */ noinstr struct cpu_entry_area *get_cpu_entry_area(int cpu) { - unsigned long va = CPU_ENTRY_AREA_PER_CPU + cpu * CPU_ENTRY_AREA_SIZE; + unsigned long va = CPU_ENTRY_AREA_PER_CPU + cea_offset(cpu) * CPU_ENTRY_AREA_SIZE; BUILD_BUG_ON(sizeof(struct cpu_entry_area) % PAGE_SIZE != 0);
return (struct cpu_entry_area *) va; @@ -205,7 +247,6 @@ static __init void setup_cpu_entry_area_ptes(void)
/* The +1 is for the readonly IDT: */ BUILD_BUG_ON((CPU_ENTRY_AREA_PAGES+1)*PAGE_SIZE != CPU_ENTRY_AREA_MAP_SIZE); - BUILD_BUG_ON(CPU_ENTRY_AREA_TOTAL_SIZE != CPU_ENTRY_AREA_MAP_SIZE); BUG_ON(CPU_ENTRY_AREA_BASE & ~PMD_MASK);
start = CPU_ENTRY_AREA_BASE; @@ -221,6 +262,8 @@ void __init setup_cpu_entry_areas(void) { unsigned int cpu;
+ init_cea_offsets(); + setup_cpu_entry_area_ptes();
for_each_possible_cpu(cpu)
From: Andrey Ryabinin ryabinin.a.a@gmail.com
mainline inclusion from mainline-v6.2-rc1 commit 3f148f3318140035e87decc1214795ff0755757b category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6C6UC CVE: CVE-2023-0597
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
KASAN maps shadow for the entire CPU-entry-area: [CPU_ENTRY_AREA_BASE, CPU_ENTRY_AREA_BASE + CPU_ENTRY_AREA_MAP_SIZE]
This will explode once the per-cpu entry areas are randomized since it will increase CPU_ENTRY_AREA_MAP_SIZE to 512 GB and KASAN fails to allocate shadow for such big area.
Fix this by allocating KASAN shadow only for really used cpu entry area addresses mapped by cea_map_percpu_pages()
Thanks to the 0day folks for finding and reporting this to be an issue.
[ dhansen: tweak changelog since this will get committed before peterz's actual cpu-entry-area randomization ]
Signed-off-by: Andrey Ryabinin ryabinin.a.a@gmail.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Tested-by: Yujie Liu yujie.liu@intel.com Cc: kernel test robot yujie.liu@intel.com Link: https://lore.kernel.org/r/202210241508.2e203c3d-yujie.liu@intel.com Signed-off-by: Tong Tiangen tongtiangen@huawei.com Reviewed-by: Chen Wandun chenwandun@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- arch/x86/include/asm/kasan.h | 3 +++ arch/x86/mm/cpu_entry_area.c | 8 +++++++- arch/x86/mm/kasan_init_64.c | 15 ++++++++++++--- 3 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/kasan.h b/arch/x86/include/asm/kasan.h index 13e70da38bed..de75306b932e 100644 --- a/arch/x86/include/asm/kasan.h +++ b/arch/x86/include/asm/kasan.h @@ -28,9 +28,12 @@ #ifdef CONFIG_KASAN void __init kasan_early_init(void); void __init kasan_init(void); +void __init kasan_populate_shadow_for_vaddr(void *va, size_t size, int nid); #else static inline void kasan_early_init(void) { } static inline void kasan_init(void) { } +static inline void kasan_populate_shadow_for_vaddr(void *va, size_t size, + int nid) { } #endif
#endif diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c index 98396a67958c..bf5786f3f3c4 100644 --- a/arch/x86/mm/cpu_entry_area.c +++ b/arch/x86/mm/cpu_entry_area.c @@ -10,6 +10,7 @@ #include <asm/cpu_entry_area.h> #include <asm/fixmap.h> #include <asm/desc.h> +#include <asm/kasan.h>
static DEFINE_PER_CPU_PAGE_ALIGNED(struct entry_stack_page, entry_stack_storage);
@@ -95,8 +96,13 @@ void cea_set_pte(void *cea_vaddr, phys_addr_t pa, pgprot_t flags) static void __init cea_map_percpu_pages(void *cea_vaddr, void *ptr, int pages, pgprot_t prot) { + phys_addr_t pa = per_cpu_ptr_to_phys(ptr); + + kasan_populate_shadow_for_vaddr(cea_vaddr, pages * PAGE_SIZE, + early_pfn_to_nid(PFN_DOWN(pa))); + for ( ; pages; pages--, cea_vaddr+= PAGE_SIZE, ptr += PAGE_SIZE) - cea_set_pte(cea_vaddr, per_cpu_ptr_to_phys(ptr), prot); + cea_set_pte(cea_vaddr, pa, prot); }
static void __init percpu_setup_debug_store(unsigned int cpu) diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c index 1a50434c8a4d..18d574af30ef 100644 --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -318,6 +318,18 @@ void __init kasan_early_init(void) kasan_map_early_shadow(init_top_pgt); }
+void __init kasan_populate_shadow_for_vaddr(void *va, size_t size, int nid) +{ + unsigned long shadow_start, shadow_end; + + shadow_start = (unsigned long)kasan_mem_to_shadow(va); + shadow_start = round_down(shadow_start, PAGE_SIZE); + shadow_end = (unsigned long)kasan_mem_to_shadow(va + size); + shadow_end = round_up(shadow_end, PAGE_SIZE); + + kasan_populate_shadow(shadow_start, shadow_end, nid); +} + void __init kasan_init(void) { int i; @@ -395,9 +407,6 @@ void __init kasan_init(void) kasan_mem_to_shadow((void *)VMALLOC_END + 1), shadow_cpu_entry_begin);
- kasan_populate_shadow((unsigned long)shadow_cpu_entry_begin, - (unsigned long)shadow_cpu_entry_end, 0); - kasan_populate_early_shadow(shadow_cpu_entry_end, kasan_mem_to_shadow((void *)__START_KERNEL_map));
From: Sean Christopherson seanjc@google.com
mainline inclusion from mainline-v6.2-rc1 commit 80d72a8f76e8f3f0b5a70b8c7022578e17bde8e7 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6C6UC CVE: CVE-2023-0597
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Recompute the physical address for each per-CPU page in the CPU entry area, a recent commit inadvertantly modified cea_map_percpu_pages() such that every PTE is mapped to the physical address of the first page.
Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand") Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Andrey Ryabinin ryabinin.a.a@gmail.com Link: https://lkml.kernel.org/r/20221110203504.1985010-2-seanjc@google.com Signed-off-by: Tong Tiangen tongtiangen@huawei.com Reviewed-by: Chen Wandun chenwandun@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- arch/x86/mm/cpu_entry_area.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c index bf5786f3f3c4..2eaf9ee41495 100644 --- a/arch/x86/mm/cpu_entry_area.c +++ b/arch/x86/mm/cpu_entry_area.c @@ -102,7 +102,7 @@ cea_map_percpu_pages(void *cea_vaddr, void *ptr, int pages, pgprot_t prot) early_pfn_to_nid(PFN_DOWN(pa)));
for ( ; pages; pages--, cea_vaddr+= PAGE_SIZE, ptr += PAGE_SIZE) - cea_set_pte(cea_vaddr, pa, prot); + cea_set_pte(cea_vaddr, per_cpu_ptr_to_phys(ptr), prot); }
static void __init percpu_setup_debug_store(unsigned int cpu)
From: Sean Christopherson seanjc@google.com
mainline inclusion from mainline-v6.2-rc1 commit 97650148a15e0b30099d6175ffe278b9f55ec66a category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6C6UC CVE: CVE-2023-0597
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Populate a KASAN shadow for the entire possible per-CPU range of the CPU entry area instead of requiring that each individual chunk map a shadow. Mapping shadows individually is error prone, e.g. the per-CPU GDT mapping was left behind, which can lead to not-present page faults during KASAN validation if the kernel performs a software lookup into the GDT. The DS buffer is also likely affected.
The motivation for mapping the per-CPU areas on-demand was to avoid mapping the entire 512GiB range that's reserved for the CPU entry area, shaving a few bytes by not creating shadows for potentially unused memory was not a goal.
The bug is most easily reproduced by doing a sigreturn with a garbage CS in the sigcontext, e.g.
int main(void) { struct sigcontext regs;
syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul); syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul); syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
memset(®s, 0, sizeof(regs)); regs.cs = 0x1d0; syscall(__NR_rt_sigreturn); return 0; }
to coerce the kernel into doing a GDT lookup to compute CS.base when reading the instruction bytes on the subsequent #GP to determine whether or not the #GP is something the kernel should handle, e.g. to fixup UMIP violations or to emulate CLI/STI for IOPL=3 applications.
BUG: unable to handle page fault for address: fffffbc8379ace00 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 16c03a067 P4D 16c03a067 PUD 15b990067 PMD 15b98f067 PTE 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 3 PID: 851 Comm: r2 Not tainted 6.1.0-rc3-next-20221103+ #432 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kasan_check_range+0xdf/0x190 Call Trace: <TASK> get_desc+0xb0/0x1d0 insn_get_seg_base+0x104/0x270 insn_fetch_from_user+0x66/0x80 fixup_umip_exception+0xb1/0x530 exc_general_protection+0x181/0x210 asm_exc_general_protection+0x22/0x30 RIP: 0003:0x0 Code: Unable to access opcode bytes at 0xffffffffffffffd6. RSP: 0003:0000000000000000 EFLAGS: 00000202 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000000001d0 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK>
Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand") Reported-by: syzbot+ffb4f000dc2872c93f62@syzkaller.appspotmail.com Suggested-by: Andrey Ryabinin ryabinin.a.a@gmail.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Andrey Ryabinin ryabinin.a.a@gmail.com Link: https://lkml.kernel.org/r/20221110203504.1985010-3-seanjc@google.com Signed-off-by: Tong Tiangen tongtiangen@huawei.com Reviewed-by: Chen Wandun chenwandun@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- arch/x86/mm/cpu_entry_area.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c index 2eaf9ee41495..88e2cc4d4e75 100644 --- a/arch/x86/mm/cpu_entry_area.c +++ b/arch/x86/mm/cpu_entry_area.c @@ -96,11 +96,6 @@ void cea_set_pte(void *cea_vaddr, phys_addr_t pa, pgprot_t flags) static void __init cea_map_percpu_pages(void *cea_vaddr, void *ptr, int pages, pgprot_t prot) { - phys_addr_t pa = per_cpu_ptr_to_phys(ptr); - - kasan_populate_shadow_for_vaddr(cea_vaddr, pages * PAGE_SIZE, - early_pfn_to_nid(PFN_DOWN(pa))); - for ( ; pages; pages--, cea_vaddr+= PAGE_SIZE, ptr += PAGE_SIZE) cea_set_pte(cea_vaddr, per_cpu_ptr_to_phys(ptr), prot); } @@ -200,6 +195,9 @@ static void __init setup_cpu_entry_area(unsigned int cpu) pgprot_t tss_prot = PAGE_KERNEL; #endif
+ kasan_populate_shadow_for_vaddr(cea, CPU_ENTRY_AREA_SIZE, + early_cpu_to_node(cpu)); + cea_set_pte(&cea->gdt, get_cpu_gdt_paddr(cpu), gdt_prot);
cea_map_percpu_pages(&cea->entry_stack_page,
From: Sean Christopherson seanjc@google.com
mainline inclusion from mainline-v6.2-rc1 commit 7077d2ccb94dafd00b29cc2d601c9f6891648f5b category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6C6UC CVE: CVE-2023-0597
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Rename the CPU entry area variables in kasan_init() to shorten their names, a future fix will reference the beginning of the per-CPU portion of the CPU entry area, and shadow_cpu_entry_per_cpu_begin is a bit much.
No functional change intended.
Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Andrey Ryabinin ryabinin.a.a@gmail.com Link: https://lkml.kernel.org/r/20221110203504.1985010-4-seanjc@google.com Signed-off-by: Tong Tiangen tongtiangen@huawei.com Reviewed-by: Chen Wandun chenwandun@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- arch/x86/mm/kasan_init_64.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c index 18d574af30ef..75b0eb482487 100644 --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -333,7 +333,7 @@ void __init kasan_populate_shadow_for_vaddr(void *va, size_t size, int nid) void __init kasan_init(void) { int i; - void *shadow_cpu_entry_begin, *shadow_cpu_entry_end; + void *shadow_cea_begin, *shadow_cea_end;
memcpy(early_top_pgt, init_top_pgt, sizeof(early_top_pgt));
@@ -374,16 +374,16 @@ void __init kasan_init(void) map_range(&pfn_mapped[i]); }
- shadow_cpu_entry_begin = (void *)CPU_ENTRY_AREA_BASE; - shadow_cpu_entry_begin = kasan_mem_to_shadow(shadow_cpu_entry_begin); - shadow_cpu_entry_begin = (void *)round_down( - (unsigned long)shadow_cpu_entry_begin, PAGE_SIZE); + shadow_cea_begin = (void *)CPU_ENTRY_AREA_BASE; + shadow_cea_begin = kasan_mem_to_shadow(shadow_cea_begin); + shadow_cea_begin = (void *)round_down( + (unsigned long)shadow_cea_begin, PAGE_SIZE);
- shadow_cpu_entry_end = (void *)(CPU_ENTRY_AREA_BASE + + shadow_cea_end = (void *)(CPU_ENTRY_AREA_BASE + CPU_ENTRY_AREA_MAP_SIZE); - shadow_cpu_entry_end = kasan_mem_to_shadow(shadow_cpu_entry_end); - shadow_cpu_entry_end = (void *)round_up( - (unsigned long)shadow_cpu_entry_end, PAGE_SIZE); + shadow_cea_end = kasan_mem_to_shadow(shadow_cea_end); + shadow_cea_end = (void *)round_up( + (unsigned long)shadow_cea_end, PAGE_SIZE);
kasan_populate_early_shadow( kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM), @@ -405,9 +405,9 @@ void __init kasan_init(void)
kasan_populate_early_shadow( kasan_mem_to_shadow((void *)VMALLOC_END + 1), - shadow_cpu_entry_begin); + shadow_cea_begin);
- kasan_populate_early_shadow(shadow_cpu_entry_end, + kasan_populate_early_shadow(shadow_cea_end, kasan_mem_to_shadow((void *)__START_KERNEL_map));
kasan_populate_shadow((unsigned long)kasan_mem_to_shadow(_stext),
From: Sean Christopherson seanjc@google.com
mainline inclusion from mainline-v6.2-rc1 commit bde258d97409f2a45243cb393a55ea9ecfc7aba5 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6C6UC CVE: CVE-2023-0597
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Add helpers to dedup code for aligning shadow address up/down to page boundaries when translating an address to its shadow.
No functional change intended.
Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Andrey Ryabinin ryabinin.a.a@gmail.com Link: https://lkml.kernel.org/r/20221110203504.1985010-5-seanjc@google.com Signed-off-by: Tong Tiangen tongtiangen@huawei.com Reviewed-by: Chen Wandun chenwandun@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- arch/x86/mm/kasan_init_64.c | 40 ++++++++++++++++++++----------------- 1 file changed, 22 insertions(+), 18 deletions(-)
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c index 75b0eb482487..c1d99556c024 100644 --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -318,22 +318,33 @@ void __init kasan_early_init(void) kasan_map_early_shadow(init_top_pgt); }
+static unsigned long kasan_mem_to_shadow_align_down(unsigned long va) +{ + unsigned long shadow = (unsigned long)kasan_mem_to_shadow((void *)va); + + return round_down(shadow, PAGE_SIZE); +} + +static unsigned long kasan_mem_to_shadow_align_up(unsigned long va) +{ + unsigned long shadow = (unsigned long)kasan_mem_to_shadow((void *)va); + + return round_up(shadow, PAGE_SIZE); +} + void __init kasan_populate_shadow_for_vaddr(void *va, size_t size, int nid) { unsigned long shadow_start, shadow_end;
- shadow_start = (unsigned long)kasan_mem_to_shadow(va); - shadow_start = round_down(shadow_start, PAGE_SIZE); - shadow_end = (unsigned long)kasan_mem_to_shadow(va + size); - shadow_end = round_up(shadow_end, PAGE_SIZE); - + shadow_start = kasan_mem_to_shadow_align_down((unsigned long)va); + shadow_end = kasan_mem_to_shadow_align_up((unsigned long)va + size); kasan_populate_shadow(shadow_start, shadow_end, nid); }
void __init kasan_init(void) { + unsigned long shadow_cea_begin, shadow_cea_end; int i; - void *shadow_cea_begin, *shadow_cea_end;
memcpy(early_top_pgt, init_top_pgt, sizeof(early_top_pgt));
@@ -374,16 +385,9 @@ void __init kasan_init(void) map_range(&pfn_mapped[i]); }
- shadow_cea_begin = (void *)CPU_ENTRY_AREA_BASE; - shadow_cea_begin = kasan_mem_to_shadow(shadow_cea_begin); - shadow_cea_begin = (void *)round_down( - (unsigned long)shadow_cea_begin, PAGE_SIZE); - - shadow_cea_end = (void *)(CPU_ENTRY_AREA_BASE + - CPU_ENTRY_AREA_MAP_SIZE); - shadow_cea_end = kasan_mem_to_shadow(shadow_cea_end); - shadow_cea_end = (void *)round_up( - (unsigned long)shadow_cea_end, PAGE_SIZE); + shadow_cea_begin = kasan_mem_to_shadow_align_down(CPU_ENTRY_AREA_BASE); + shadow_cea_end = kasan_mem_to_shadow_align_up(CPU_ENTRY_AREA_BASE + + CPU_ENTRY_AREA_MAP_SIZE);
kasan_populate_early_shadow( kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM), @@ -405,9 +409,9 @@ void __init kasan_init(void)
kasan_populate_early_shadow( kasan_mem_to_shadow((void *)VMALLOC_END + 1), - shadow_cea_begin); + (void *)shadow_cea_begin);
- kasan_populate_early_shadow(shadow_cea_end, + kasan_populate_early_shadow((void *)shadow_cea_end, kasan_mem_to_shadow((void *)__START_KERNEL_map));
kasan_populate_shadow((unsigned long)kasan_mem_to_shadow(_stext),
From: Sean Christopherson seanjc@google.com
mainline inclusion from mainline-v6.2-rc1 commit 1cfaac2400c73378e78182a706be0f3ac8b93cd7 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6C6UC CVE: CVE-2023-0597
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Popuplate the shadow for the shared portion of the CPU entry area, i.e. the read-only IDT mapping, during KASAN initialization. A recent change modified KASAN to map the per-CPU areas on-demand, but forgot to keep a shadow for the common area that is shared amongst all CPUs.
Map the common area in KASAN init instead of letting idt_map_in_cea() do the dirty work so that it Just Works in the unlikely event more shared data is shoved into the CPU entry area.
The bug manifests as a not-present #PF when software attempts to lookup an IDT entry, e.g. when KVM is handling IRQs on Intel CPUs (KVM performs direct CALL to the IRQ handler to avoid the overhead of INTn):
BUG: unable to handle page fault for address: fffffbc0000001d8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 16c03a067 P4D 16c03a067 PUD 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 5 PID: 901 Comm: repro Tainted: G W 6.1.0-rc3+ #410 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:kasan_check_range+0xdf/0x190 vmx_handle_exit_irqoff+0x152/0x290 [kvm_intel] vcpu_run+0x1d89/0x2bd0 [kvm] kvm_arch_vcpu_ioctl_run+0x3ce/0xa70 [kvm] kvm_vcpu_ioctl+0x349/0x900 [kvm] __x64_sys_ioctl+0xb8/0xf0 do_syscall_64+0x2b/0x50 entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fixes: 9fd429c28073 ("x86/kasan: Map shadow for percpu pages on demand") Reported-by: syzbot+8cdd16fd5a6c0565e227@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson seanjc@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20221110203504.1985010-6-seanjc@google.com Signed-off-by: Tong Tiangen tongtiangen@huawei.com Reviewed-by: Chen Wandun chenwandun@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- arch/x86/mm/kasan_init_64.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c index c1d99556c024..b4b187960dd0 100644 --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -343,7 +343,7 @@ void __init kasan_populate_shadow_for_vaddr(void *va, size_t size, int nid)
void __init kasan_init(void) { - unsigned long shadow_cea_begin, shadow_cea_end; + unsigned long shadow_cea_begin, shadow_cea_per_cpu_begin, shadow_cea_end; int i;
memcpy(early_top_pgt, init_top_pgt, sizeof(early_top_pgt)); @@ -386,6 +386,7 @@ void __init kasan_init(void) }
shadow_cea_begin = kasan_mem_to_shadow_align_down(CPU_ENTRY_AREA_BASE); + shadow_cea_per_cpu_begin = kasan_mem_to_shadow_align_up(CPU_ENTRY_AREA_PER_CPU); shadow_cea_end = kasan_mem_to_shadow_align_up(CPU_ENTRY_AREA_BASE + CPU_ENTRY_AREA_MAP_SIZE);
@@ -411,6 +412,15 @@ void __init kasan_init(void) kasan_mem_to_shadow((void *)VMALLOC_END + 1), (void *)shadow_cea_begin);
+ /* + * Populate the shadow for the shared portion of the CPU entry area. + * Shadows for the per-CPU areas are mapped on-demand, as each CPU's + * area is randomly placed somewhere in the 512GiB range and mapping + * the entire 512GiB range is prohibitively expensive. + */ + kasan_populate_shadow(shadow_cea_begin, + shadow_cea_per_cpu_begin, 0); + kasan_populate_early_shadow((void *)shadow_cea_end, kasan_mem_to_shadow((void *)__START_KERNEL_map));