From: Yuanzheng Song songyuanzheng@huawei.com
mainline inclusion from mainline-v5.16-rc1 commit 7e6ec49c18988f1b8dab0677271dafde5f8d9a43 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6AW65 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When reading memcg->socket_pressure in mem_cgroup_under_socket_pressure() and writing memcg->socket_pressure in vmpressure() at the same time, the following data-race occurs:
BUG: KCSAN: data-race in __sk_mem_reduce_allocated / vmpressure
write to 0xffff8881286f4938 of 8 bytes by task 24550 on cpu 3: vmpressure+0x218/0x230 mm/vmpressure.c:307 shrink_node_memcgs+0x2b9/0x410 mm/vmscan.c:2658 shrink_node+0x9d2/0x11d0 mm/vmscan.c:2769 shrink_zones+0x29f/0x470 mm/vmscan.c:2972 do_try_to_free_pages+0x193/0x6e0 mm/vmscan.c:3027 try_to_free_mem_cgroup_pages+0x1c0/0x3f0 mm/vmscan.c:3345 reclaim_high mm/memcontrol.c:2440 [inline] mem_cgroup_handle_over_high+0x18b/0x4d0 mm/memcontrol.c:2624 tracehook_notify_resume include/linux/tracehook.h:197 [inline] exit_to_user_mode_loop kernel/entry/common.c:164 [inline] exit_to_user_mode_prepare+0x110/0x170 kernel/entry/common.c:191 syscall_exit_to_user_mode+0x16/0x30 kernel/entry/common.c:266 ret_from_fork+0x15/0x30 arch/x86/entry/entry_64.S:289
read to 0xffff8881286f4938 of 8 bytes by interrupt on cpu 1: mem_cgroup_under_socket_pressure include/linux/memcontrol.h:1483 [inline] sk_under_memory_pressure include/net/sock.h:1314 [inline] __sk_mem_reduce_allocated+0x1d2/0x270 net/core/sock.c:2696 __sk_mem_reclaim+0x44/0x50 net/core/sock.c:2711 sk_mem_reclaim include/net/sock.h:1490 [inline] ...... net_rx_action+0x17a/0x480 net/core/dev.c:6864 __do_softirq+0x12c/0x2af kernel/softirq.c:298 run_ksoftirqd+0x13/0x20 kernel/softirq.c:653 smpboot_thread_fn+0x33f/0x510 kernel/smpboot.c:165 kthread+0x1fc/0x220 kernel/kthread.c:292 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
Fix it by using READ_ONCE() and WRITE_ONCE() to read and write memcg->socket_pressure.
Link: https://lkml.kernel.org/r/20211025082843.671690-1-songyuanzheng@huawei.com Signed-off-by: Yuanzheng Song songyuanzheng@huawei.com Reviewed-by: Muchun Song songmuchun@bytedance.com Cc: Shakeel Butt shakeelb@google.com Cc: Roman Gushchin guro@fb.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Michal Hocko mhocko@suse.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Alex Shi alexs@kernel.org Cc: Wei Yang richard.weiyang@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Cai Xinchen caixinchen1@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- include/linux/memcontrol.h | 2 +- mm/vmpressure.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index abb2476bd282..89e4f01c5710 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1921,7 +1921,7 @@ static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg) if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && memcg->tcpmem_pressure) return true; do { - if (time_before(jiffies, memcg->socket_pressure)) + if (time_before(jiffies, READ_ONCE(memcg->socket_pressure))) return true; } while ((memcg = parent_mem_cgroup(memcg))); return false; diff --git a/mm/vmpressure.c b/mm/vmpressure.c index d69019fc3789..c6a7eca666f6 100644 --- a/mm/vmpressure.c +++ b/mm/vmpressure.c @@ -304,7 +304,7 @@ void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, bool tree, * asserted for a second in which subsequent * pressure events can occur. */ - memcg->socket_pressure = jiffies + HZ; + WRITE_ONCE(memcg->socket_pressure, jiffies + HZ); } } }
From: Szymon Heidrich szymon.heidrich@gmail.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6AQJP CVE: CVE-2023-23559
Reference: https://patchwork.kernel.org/project/linux-wireless/patch/20230111175031.704...
-------------------------------
Since resplen and respoffs are signed integers sufficiently large values of unsigned int len and offset members of RNDIS response will result in negative values of prior variables. This may be utilized to bypass implemented security checks to either extract memory contents by manipulating offset or overflow the data buffer via memcpy by manipulating both offset and len.
Additionally assure that sum of resplen and respoffs does not overflow so buffer boundaries are kept.
Fixes: 80f8c5b434f9 ("rndis_wlan: copy only useful data from rndis_command respond") Signed-off-by: Szymon Heidrich szymon.heidrich@gmail.com Signed-off-by: Wang Yufen wangyufen@huawei.com Reviewed-by: Wei Yongjun weiyongjun1@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/net/wireless/rndis_wlan.c | 19 ++++++------------- 1 file changed, 6 insertions(+), 13 deletions(-)
diff --git a/drivers/net/wireless/rndis_wlan.c b/drivers/net/wireless/rndis_wlan.c index 75b5d545b49e..dc076d844868 100644 --- a/drivers/net/wireless/rndis_wlan.c +++ b/drivers/net/wireless/rndis_wlan.c @@ -694,8 +694,8 @@ static int rndis_query_oid(struct usbnet *dev, u32 oid, void *data, int *len) struct rndis_query *get; struct rndis_query_c *get_c; } u; - int ret, buflen; - int resplen, respoffs, copylen; + int ret; + size_t buflen, resplen, respoffs, copylen;
buflen = *len + sizeof(*u.get); if (buflen < CONTROL_BUFFER_SIZE) @@ -730,22 +730,15 @@ static int rndis_query_oid(struct usbnet *dev, u32 oid, void *data, int *len)
if (respoffs > buflen) { /* Device returned data offset outside buffer, error. */ - netdev_dbg(dev->net, "%s(%s): received invalid " - "data offset: %d > %d\n", __func__, - oid_to_string(oid), respoffs, buflen); + netdev_dbg(dev->net, + "%s(%s): received invalid data offset: %zu > %zu\n", + __func__, oid_to_string(oid), respoffs, buflen);
ret = -EINVAL; goto exit_unlock; }
- if ((resplen + respoffs) > buflen) { - /* Device would have returned more data if buffer would - * have been big enough. Copy just the bits that we got. - */ - copylen = buflen - respoffs; - } else { - copylen = resplen; - } + copylen = min(resplen, buflen - respoffs);
if (copylen > *len) copylen = *len;
From: Neel Natu neelnatu@google.com
mainline inclusion from mainline-v6.1-rc1 commit 9847f21225c4eb0b843cb2b72ed83b32edb1e6f2 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6AXHU CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?...
-----------------------------------------
An argument list like "arg=val arg2 "" can trigger a page fault if the page pointed by 'args[0xffffffff]' is not mapped and potential memory corruption otherwise (unlikely but possible if the bogus address is mapped and contents happen to match the ascii value of the quote character).
The fix is to ensure that we load 'args[i-1]' only when (i > 0).
Prior to this commit the following command would trigger an unhandled page fault in the kernel:
root@(none):/linus/fs/fat# insmod ./fat.ko "foo=bar "" [ 33.870507] BUG: unable to handle page fault for address: ffff888204252608 [ 33.872180] #PF: supervisor read access in kernel mode [ 33.873414] #PF: error_code(0x0000) - not-present page [ 33.874650] PGD 4401067 P4D 4401067 PUD 0 [ 33.875321] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI [ 33.876113] CPU: 16 PID: 399 Comm: insmod Not tainted 5.19.0-dbg-DEV #4 [ 33.877193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014 [ 33.878739] RIP: 0010:next_arg+0xd1/0x110 [ 33.879399] Code: 22 75 1d 41 c6 04 01 00 41 80 f8 22 74 18 eb 35 4c 89 0e 45 31 d2 4c 89 cf 48 c7 02 00 00 00 00 41 80 f8 22 75 1f 41 8d 42 ff <41> 80 3c 01 22 75 14 41 c6 04 01 00 eb 0d 48 c7 02 00 00 00 00 41 [ 33.882338] RSP: 0018:ffffc90001253d08 EFLAGS: 00010246 [ 33.883174] RAX: 00000000ffffffff RBX: ffff888104252608 RCX: 0fc317bba1c1dd00 [ 33.884311] RDX: ffffc90001253d40 RSI: ffffc90001253d48 RDI: ffff888104252609 [ 33.885450] RBP: ffffc90001253d10 R08: 0000000000000022 R09: ffff888104252609 [ 33.886595] R10: 0000000000000000 R11: ffffffff82c7ff20 R12: 0000000000000282 [ 33.887748] R13: 00000000ffff8000 R14: 0000000000000000 R15: 0000000000007fff [ 33.888887] FS: 00007f04ec7432c0(0000) GS:ffff88813d300000(0000) knlGS:0000000000000000 [ 33.890183] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 33.891111] CR2: ffff888204252608 CR3: 0000000100f36005 CR4: 0000000000170ee0 [ 33.892241] Call Trace: [ 33.892641] <TASK> [ 33.892989] parse_args+0x8f/0x220 [ 33.893538] load_module+0x138b/0x15a0 [ 33.894149] ? prepare_coming_module+0x50/0x50 [ 33.894879] ? kernel_read_file_from_fd+0x5f/0x90 [ 33.895639] __se_sys_finit_module+0xce/0x130 [ 33.896342] __x64_sys_finit_module+0x1d/0x20 [ 33.897042] do_syscall_64+0x44/0xa0 [ 33.897622] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 33.898434] RIP: 0033:0x7f04ec85ef79 [ 33.899009] Code: 48 8d 3d da db 0d 00 0f 05 eb a5 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c7 9e 0d 00 f7 d8 64 89 01 48 [ 33.901912] RSP: 002b:00007fffae81bfe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 33.903081] RAX: ffffffffffffffda RBX: 0000559c5f1d2640 RCX: 00007f04ec85ef79 [ 33.904191] RDX: 0000000000000000 RSI: 0000559c5f1d12a0 RDI: 0000000000000003 [ 33.905304] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 33.906421] R10: 0000000000000003 R11: 0000000000000246 R12: 0000559c5f1d12a0 [ 33.907526] R13: 0000000000000000 R14: 0000559c5f1d25f0 R15: 0000559c5f1d12a0 [ 33.908631] </TASK> [ 33.908986] Modules linked in: fat(+) [last unloaded: fat] [ 33.909843] CR2: ffff888204252608 [ 33.910375] ---[ end trace 0000000000000000 ]--- [ 33.911172] RIP: 0010:next_arg+0xd1/0x110 [ 33.911796] Code: 22 75 1d 41 c6 04 01 00 41 80 f8 22 74 18 eb 35 4c 89 0e 45 31 d2 4c 89 cf 48 c7 02 00 00 00 00 41 80 f8 22 75 1f 41 8d 42 ff <41> 80 3c 01 22 75 14 41 c6 04 01 00 eb 0d 48 c7 02 00 00 00 00 41 [ 33.914643] RSP: 0018:ffffc90001253d08 EFLAGS: 00010246 [ 33.915446] RAX: 00000000ffffffff RBX: ffff888104252608 RCX: 0fc317bba1c1dd00 [ 33.916544] RDX: ffffc90001253d40 RSI: ffffc90001253d48 RDI: ffff888104252609 [ 33.917636] RBP: ffffc90001253d10 R08: 0000000000000022 R09: ffff888104252609 [ 33.918727] R10: 0000000000000000 R11: ffffffff82c7ff20 R12: 0000000000000282 [ 33.919821] R13: 00000000ffff8000 R14: 0000000000000000 R15: 0000000000007fff [ 33.920908] FS: 00007f04ec7432c0(0000) GS:ffff88813d300000(0000) knlGS:0000000000000000 [ 33.922125] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 33.923017] CR2: ffff888204252608 CR3: 0000000100f36005 CR4: 0000000000170ee0 [ 33.924098] Kernel panic - not syncing: Fatal exception [ 33.925776] Kernel Offset: disabled [ 33.926347] Rebooting in 10 seconds..
Link: https://lkml.kernel.org/r/20220728232434.1666488-1-neelnatu@google.com Signed-off-by: Neel Natu neelnatu@google.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Zhang Zekun zhangzekun11@huawei.com Reviewed-by: Weilong Chen chenweilong@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- lib/cmdline.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/cmdline.c b/lib/cmdline.c index fbb9981a04a4..75cb23b0ee69 100644 --- a/lib/cmdline.c +++ b/lib/cmdline.c @@ -235,7 +235,7 @@ char *next_arg(char *args, char **param, char **val) args[i-1] = '\0'; } } - if (quoted && args[i-1] == '"') + if (quoted && i > 0 && args[i-1] == '"') args[i-1] = '\0';
if (args[i]) {
From: Marek Bykowski marek.bykowski@gmail.com
mainline inclusion from mainline-v6.1-rc1 commit d5e3050c0feb8bf7b9a75482fafcc31b90257926 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6AXHU CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?...
--------------------------------------------
If the properties 'linux,initrd-start' and 'linux,initrd-end' of the chosen node populated from the bootloader, eg. U-Boot, are so that start > end, then the phys_initrd_size calculated from end - start is negative that subsequently gets converted to a high positive value for being unsigned long long. Then, the memory region with the (invalid) size is added to the bootmem and attempted being paged in paging_init() that results in the kernel fault.
For example, on the FVP ARM64 system I'm running, the U-Boot populates the 'linux,initrd-start' with 8800_0000 and 'linux,initrd-end' with 0. The phys_initrd_size calculated is then ffff_ffff_7800_0000 (= 0 - 8800_0000 = -8800_0000 + ULLONG_MAX + 1). paging_init() then attempts to map the address 8800_0000 + ffff_ffff_7800_0000 and oops'es as below.
It should be stressed, it is generally a fault of the bootloader's with the kernel relying on it, however we should not allow the bootloader's misconfiguration to lead to the kernel oops. Not only the kernel should be bullet proof against it but also finding the root cause of the paging fault spanning over the bootloader, DT, and kernel may happen is not so easy.
Unable to handle kernel paging request at virtual address fffffffefe43c000 Mem abort info: ESR = 0x96000007 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000007 CM = 0, WnR = 0 swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000080e3d000 [fffffffefe43c000] pgd=0000000080de9003, pud=0000000080de9003 Unable to handle kernel paging request at virtual address ffffff8000de9f90 Mem abort info: ESR = 0x96000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000005 CM = 0, WnR = 0 swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000080e3d000 [ffffff8000de9f90] pgd=0000000000000000, pud=0000000000000000 Internal error: Oops: 96000005 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.51-yocto-standard #1 Hardware name: FVP Base (DT) pstate: 60000085 (nZCv daIf -PAN -UAO) pc : show_pte+0x12c/0x1b4 lr : show_pte+0x100/0x1b4 sp : ffffffc010ce3b30 x29: ffffffc010ce3b30 x28: ffffffc010ceed80 x27: fffffffefe43c000 x26: fffffffefe43a028 x25: 0000000080bf0000 x24: 0000000000000025 x23: ffffffc010b8d000 x22: ffffffc010e3d000 x23: ffffffc010b8d000 x22: ffffffc010e3d000 x21: 0000000080de9000 x20: ffffff7f80000f90 x19: fffffffefe43c000 x18: 0000000000000030 x17: 0000000000001400 x16: 0000000000001c00 x15: ffffffc010cef1b8 x14: ffffffffffffffff x13: ffffffc010df1f40 x12: ffffffc010df1b70 x11: ffffffc010ce3b30 x10: ffffffc010ce3b30 x9 : 00000000ffffffc8 x8 : 0000000000000000 x7 : 000000000000000f x6 : ffffffc010df16e8 x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff x2 : 0000000000000000 x1 : 0000008080000000 x0 : ffffffc010af1d68 Call trace: show_pte+0x12c/0x1b4 die_kernel_fault+0x54/0x78 __do_kernel_fault+0x11c/0x128 do_translation_fault+0x58/0xac do_mem_abort+0x50/0xb0 el1_da+0x1c/0x90 __create_pgd_mapping+0x348/0x598 paging_init+0x3f0/0x70d0 setup_arch+0x2c0/0x5d4 start_kernel+0x94/0x49c Code: 92748eb5 900052a0 9135a000 cb010294 (f8756a96)
Signed-off-by: Marek Bykowski marek.bykowski@gmail.com Link: https://lore.kernel.org/r/20220909023358.76881-1-marek.bykowski@gmail.com Signed-off-by: Rob Herring robh@kernel.org Signed-off-by: Zhang Zekun zhangzekun11@huawei.com Reviewed-by: Weilong Chen chenweilong@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/of/fdt.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index 57ff31b6b1e4..81aede5c2dc6 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -885,6 +885,8 @@ static void __init early_init_dt_check_for_initrd(unsigned long node) if (!prop) return; end = of_read_number(prop, len/4); + if (start > end) + return;
__early_init_dt_declare_initrd(start, end); phys_initrd_start = start;
From: Zhang Wensheng zhangwensheng5@huawei.com
hulk inclusion category: bugfix bugzilla: 187025, https://gitee.com/openeuler/kernel/issues/I6B1LN CVE: NA
--------------------------------
Kasan report a bug like below: [ 494.865170] ================================================================== [ 494.901335] BUG: KASAN: slab-out-of-bounds in ses_enclosure_data_process+0x234/0x6f0 [ses] [ 494.901347] Write of size 1 at addr ffff8882f3181a70 by task systemd-udevd/1704 [ 494.931929] i801_smbus 0000:00:1f.4: SPD Write Disable is set
[ 494.944092] CPU: 12 PID: 1704 Comm: systemd-udevd Tainted: G [ 494.944101] Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 7.01 11/13/2019 [ 494.964003] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt [ 494.978532] Call Trace: [ 494.978544] dump_stack+0xbe/0xf9 [ 494.978558] print_address_description.constprop.0+0x19/0x130 [ 495.092838] ? ses_enclosure_data_process+0x234/0x6f0 [ses] [ 495.092846] __kasan_report.cold+0x68/0x80 [ 495.092855] ? __kasan_kmalloc.constprop.0+0x71/0xd0 [ 495.092862] ? ses_enclosure_data_process+0x234/0x6f0 [ses] [ 495.092868] kasan_report+0x3a/0x50 [ 495.092875] ses_enclosure_data_process+0x234/0x6f0 [ses] [ 495.092882] ? mutex_unlock+0x1d/0x40 [ 495.092889] ses_intf_add+0x57f/0x910 [ses] [ 495.092900] class_interface_register+0x26d/0x290 [ 495.092906] ? class_destroy+0xd0/0xd0 [ 495.092912] ? 0xffffffffc0bf8000 [ 495.092919] ses_init+0x18/0x1000 [ses] [ 495.092927] do_one_initcall+0xcb/0x370 [ 495.092934] ? initcall_blacklisted+0x1b0/0x1b0 [ 495.092942] ? create_object.isra.0+0x330/0x3a0 [ 495.092950] ? kasan_unpoison_shadow+0x33/0x40 [ 495.092957] ? kasan_unpoison_shadow+0x33/0x40 [ 495.092966] do_init_module+0xe4/0x3a0 [ 495.092972] load_module+0xd0a/0xdd0 [ 495.092980] ? layout_and_allocate+0x300/0x300 [ 495.092989] ? seccomp_run_filters+0x1d6/0x2c0 [ 495.092999] ? kernel_read_file_from_fd+0xb3/0xe0 [ 495.093006] __se_sys_finit_module+0x11b/0x1b0 [ 495.093012] ? __ia32_sys_init_module+0x40/0x40 [ 495.093023] ? __audit_syscall_entry+0x226/0x290 [ 495.093032] do_syscall_64+0x33/0x40 [ 495.093041] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 495.093046] RIP: 0033:0x7f39c3376089 [ 495.093054] Code: 00 48 81 c4 80 00 00 00 89 f0 c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e7 dd 0b 00 f7 d8 64 89 01 48 [ 495.093058] RSP: 002b:00007ffdc6009e18 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 495.093068] RAX: ffffffffffffffda RBX: 000055d4192801c0 RCX: 00007f39c3376089 [ 495.093072] RDX: 0000000000000000 RSI: 00007f39c2fae99d RDI: 000000000000000f [ 495.093076] RBP: 00007f39c2fae99d R08: 0000000000000000 R09: 0000000000000001 [ 495.093080] R10: 000000000000000f R11: 0000000000000246 R12: 0000000000000000 [ 495.093084] R13: 000055d419282e00 R14: 0000000000020000 R15: 000055d41927f1f0
[ 495.093091] Allocated by task 1704: [ 495.093098] kasan_save_stack+0x1b/0x40 [ 495.093105] __kasan_kmalloc.constprop.0+0xc2/0xd0 [ 495.093111] ses_enclosure_data_process+0x65d/0x6f0 [ses] [ 495.093117] ses_intf_add+0x57f/0x910 [ses] [ 495.093123] class_interface_register+0x26d/0x290 [ 495.093129] ses_init+0x18/0x1000 [ses] [ 495.093134] do_one_initcall+0xcb/0x370 [ 495.093139] do_init_module+0xe4/0x3a0 [ 495.093144] load_module+0xd0a/0xdd0 [ 495.093150] __se_sys_finit_module+0x11b/0x1b0 [ 495.093155] do_syscall_64+0x33/0x40 [ 495.093162] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 495.093168] The buggy address belongs to the object at ffff8882f3181a40 which belongs to the cache kmalloc-64 of size 64 [ 495.093173] The buggy address is located 48 bytes inside of 64-byte region [ffff8882f3181a40, ffff8882f3181a80) [ 495.093175] The buggy address belongs to the page: [ 495.093181] page:ffffea000bcc6000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2f3180 [ 495.093186] head:ffffea000bcc6000 order:2 compound_mapcount:0 compound_pincount:0 [ 495.093194] flags: 0x17ffe0000010200(slab|head|node=0|zone=2|lastcpupid=0x3fff) [ 495.093204] raw: 017ffe0000010200 ffffea0016e5fb08 ffffea0016921508 ffff888100050e00 [ 495.093211] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000 [ 495.093213] page dumped because: kasan: bad access detected
[ 495.093216] Memory state around the buggy address: [ 495.093222] ffff8882f3181900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 495.093227] ffff8882f3181980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 495.093231] >ffff8882f3181a00: fc fc fc fc fc fc fc fc 00 00 00 00 01 fc fc fc [ 495.093234] ^ [ 495.093239] ffff8882f3181a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 495.093244] ffff8882f3181b00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 495.093246] ==================================================================
After analysis on vmcore, it was found that the line "desc_ptr[len] = '\0';" has slab-out-of-bounds problem in ses_enclosure_data_process. In ses_enclosure_data_process, "desc_ptr" point to "buf", so it have to be limited in the memory of "buf", however. although there is "desc_ptr >= buf + page7_len" judgment, it does not work because "desc_ptr + 4 + len" may bigger than "buf + page7_len", which will lead to slab-out-of-bounds problem.
Signed-off-by: Zhang Wensheng zhangwensheng5@huawei.com Signed-off-by: Li Nan linan122@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- drivers/scsi/ses.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c index 119a51015643..cdf126bd0ffb 100644 --- a/drivers/scsi/ses.c +++ b/drivers/scsi/ses.c @@ -556,11 +556,11 @@ static void ses_enclosure_data_process(struct enclosure_device *edev, struct enclosure_component *ecomp;
if (desc_ptr) { - if (desc_ptr >= buf + page7_len) { + len = (desc_ptr[2] << 8) + desc_ptr[3]; + desc_ptr += 4; + if (desc_ptr + len > buf + page7_len) { desc_ptr = NULL; } else { - len = (desc_ptr[2] << 8) + desc_ptr[3]; - desc_ptr += 4; /* Add trailing zero - pushes into * reserved space */ desc_ptr[len] = '\0';
From: Jamal Hadi Salim jhs@mojatatu.com
stable inclusion from stable-v5.10.162 commit 5f65f48516bfeebaab1ccc52c8fad698ddf21282 category: bugfix bugzilla: 188250, https://gitee.com/src-openeuler/kernel/issues/I6AQG9 CVE: CVE-2023-23455
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit a2965c7be0522eaa18808684b7b82b248515511b ]
If asked to drop a packet via TC_ACT_SHOT it is unsafe to assume res.class contains a valid pointer Fixes: b0188d4dbe5f ("[NET_SCHED]: sch_atm: Lindent")
Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Dong Chenchen dongchenchen2@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/sched/sch_atm.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c index 1c281cc81f57..5e0d55ac9c5d 100644 --- a/net/sched/sch_atm.c +++ b/net/sched/sch_atm.c @@ -396,10 +396,13 @@ static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch, result = tcf_classify(skb, fl, &res, true); if (result < 0) continue; + if (result == TC_ACT_SHOT) + goto done; + flow = (struct atm_flow_data *)res.class; if (!flow) flow = lookup_flow(sch, res.classid); - goto done; + goto drop; } } flow = NULL;
From: Jamal Hadi Salim jhs@mojatatu.com
stable inclusion from stable-v5.10.163 commit b2c917e510e5ddbc7896329c87d20036c8b82952 category: bugfix bugzilla: 188272, https://gitee.com/src-openeuler/kernel/issues/I6AQIL CVE: CVE-2023-23454
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
[ Upstream commit caa4b35b4317d5147b3ab0fbdc9c075c7d2e9c12 ]
If asked to drop a packet via TC_ACT_SHOT it is unsafe to assume that res.class contains a valid pointer
Sample splat reported by Kyle Zeng
[ 5.405624] 0: reclassify loop, rule prio 0, protocol 800 [ 5.406326] ================================================================== [ 5.407240] BUG: KASAN: slab-out-of-bounds in cbq_enqueue+0x54b/0xea0 [ 5.407987] Read of size 1 at addr ffff88800e3122aa by task poc/299 [ 5.408731] [ 5.408897] CPU: 0 PID: 299 Comm: poc Not tainted 5.10.155+ #15 [ 5.409516] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 5.410439] Call Trace: [ 5.410764] dump_stack+0x87/0xcd [ 5.411153] print_address_description+0x7a/0x6b0 [ 5.411687] ? vprintk_func+0xb9/0xc0 [ 5.411905] ? printk+0x76/0x96 [ 5.412110] ? cbq_enqueue+0x54b/0xea0 [ 5.412323] kasan_report+0x17d/0x220 [ 5.412591] ? cbq_enqueue+0x54b/0xea0 [ 5.412803] __asan_report_load1_noabort+0x10/0x20 [ 5.413119] cbq_enqueue+0x54b/0xea0 [ 5.413400] ? __kasan_check_write+0x10/0x20 [ 5.413679] __dev_queue_xmit+0x9c0/0x1db0 [ 5.413922] dev_queue_xmit+0xc/0x10 [ 5.414136] ip_finish_output2+0x8bc/0xcd0 [ 5.414436] __ip_finish_output+0x472/0x7a0 [ 5.414692] ip_finish_output+0x5c/0x190 [ 5.414940] ip_output+0x2d8/0x3c0 [ 5.415150] ? ip_mc_finish_output+0x320/0x320 [ 5.415429] __ip_queue_xmit+0x753/0x1760 [ 5.415664] ip_queue_xmit+0x47/0x60 [ 5.415874] __tcp_transmit_skb+0x1ef9/0x34c0 [ 5.416129] tcp_connect+0x1f5e/0x4cb0 [ 5.416347] tcp_v4_connect+0xc8d/0x18c0 [ 5.416577] __inet_stream_connect+0x1ae/0xb40 [ 5.416836] ? local_bh_enable+0x11/0x20 [ 5.417066] ? lock_sock_nested+0x175/0x1d0 [ 5.417309] inet_stream_connect+0x5d/0x90 [ 5.417548] ? __inet_stream_connect+0xb40/0xb40 [ 5.417817] __sys_connect+0x260/0x2b0 [ 5.418037] __x64_sys_connect+0x76/0x80 [ 5.418267] do_syscall_64+0x31/0x50 [ 5.418477] entry_SYSCALL_64_after_hwframe+0x61/0xc6 [ 5.418770] RIP: 0033:0x473bb7 [ 5.418952] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 18 89 54 24 0c 48 89 34 24 89 [ 5.420046] RSP: 002b:00007fffd20eb0f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a [ 5.420472] RAX: ffffffffffffffda RBX: 00007fffd20eb578 RCX: 0000000000473bb7 [ 5.420872] RDX: 0000000000000010 RSI: 00007fffd20eb110 RDI: 0000000000000007 [ 5.421271] RBP: 00007fffd20eb150 R08: 0000000000000001 R09: 0000000000000004 [ 5.421671] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 [ 5.422071] R13: 00007fffd20eb568 R14: 00000000004fc740 R15: 0000000000000002 [ 5.422471] [ 5.422562] Allocated by task 299: [ 5.422782] __kasan_kmalloc+0x12d/0x160 [ 5.423007] kasan_kmalloc+0x5/0x10 [ 5.423208] kmem_cache_alloc_trace+0x201/0x2e0 [ 5.423492] tcf_proto_create+0x65/0x290 [ 5.423721] tc_new_tfilter+0x137e/0x1830 [ 5.423957] rtnetlink_rcv_msg+0x730/0x9f0 [ 5.424197] netlink_rcv_skb+0x166/0x300 [ 5.424428] rtnetlink_rcv+0x11/0x20 [ 5.424639] netlink_unicast+0x673/0x860 [ 5.424870] netlink_sendmsg+0x6af/0x9f0 [ 5.425100] __sys_sendto+0x58d/0x5a0 [ 5.425315] __x64_sys_sendto+0xda/0xf0 [ 5.425539] do_syscall_64+0x31/0x50 [ 5.425764] entry_SYSCALL_64_after_hwframe+0x61/0xc6 [ 5.426065] [ 5.426157] The buggy address belongs to the object at ffff88800e312200 [ 5.426157] which belongs to the cache kmalloc-128 of size 128 [ 5.426955] The buggy address is located 42 bytes to the right of [ 5.426955] 128-byte region [ffff88800e312200, ffff88800e312280) [ 5.427688] The buggy address belongs to the page: [ 5.427992] page:000000009875fabc refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xe312 [ 5.428562] flags: 0x100000000000200(slab) [ 5.428812] raw: 0100000000000200 dead000000000100 dead000000000122 ffff888007843680 [ 5.429325] raw: 0000000000000000 0000000000100010 00000001ffffffff ffff88800e312401 [ 5.429875] page dumped because: kasan: bad access detected [ 5.430214] page->mem_cgroup:ffff88800e312401 [ 5.430471] [ 5.430564] Memory state around the buggy address: [ 5.430846] ffff88800e312180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 5.431267] ffff88800e312200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc [ 5.431705] >ffff88800e312280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 5.432123] ^ [ 5.432391] ffff88800e312300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc [ 5.432810] ffff88800e312380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 5.433229] ================================================================== [ 5.433648] Disabling lock debugging due to kernel taint
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Kyle Zeng zengyhkyle@gmail.com Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/sched/sch_cbq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index 4a78fcf5d4f9..01853de86bc0 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -231,6 +231,8 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr) result = tcf_classify(skb, fl, &res, true); if (!fl || result < 0) goto fallback; + if (result == TC_ACT_SHOT) + return NULL;
cl = (void *)res.class; if (!cl) { @@ -251,8 +253,6 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr) case TC_ACT_TRAP: *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; fallthrough; - case TC_ACT_SHOT: - return NULL; case TC_ACT_RECLASSIFY: return cbq_reclassify(skb, cl); }
From: Wu Guanghao wuguanghao3@huawei.com
mainline inclusion from mainline-v6.2-rc1 commit 4da112513c01d7d0acf1025b8764349d46e177d6 category: bugfix bugzilla: 187874,https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
We are doing a test about deleting a large number of files when memory is low. A deadlock problem was found.
[ 1240.279183] -> #1 (fs_reclaim){+.+.}-{0:0}: [ 1240.280450] lock_acquire+0x197/0x460 [ 1240.281548] fs_reclaim_acquire.part.0+0x20/0x30 [ 1240.282625] kmem_cache_alloc+0x2b/0x940 [ 1240.283816] xfs_trans_alloc+0x8a/0x8b0 [ 1240.284757] xfs_inactive_ifree+0xe4/0x4e0 [ 1240.285935] xfs_inactive+0x4e9/0x8a0 [ 1240.286836] xfs_inodegc_worker+0x160/0x5e0 [ 1240.287969] process_one_work+0xa19/0x16b0 [ 1240.289030] worker_thread+0x9e/0x1050 [ 1240.290131] kthread+0x34f/0x460 [ 1240.290999] ret_from_fork+0x22/0x30 [ 1240.291905] [ 1240.291905] -> #0 ((work_completion)(&gc->work)){+.+.}-{0:0}: [ 1240.293569] check_prev_add+0x160/0x2490 [ 1240.294473] __lock_acquire+0x2c4d/0x5160 [ 1240.295544] lock_acquire+0x197/0x460 [ 1240.296403] __flush_work+0x6bc/0xa20 [ 1240.297522] xfs_inode_mark_reclaimable+0x6f0/0xdc0 [ 1240.298649] destroy_inode+0xc6/0x1b0 [ 1240.299677] dispose_list+0xe1/0x1d0 [ 1240.300567] prune_icache_sb+0xec/0x150 [ 1240.301794] super_cache_scan+0x2c9/0x480 [ 1240.302776] do_shrink_slab+0x3f0/0xaa0 [ 1240.303671] shrink_slab+0x170/0x660 [ 1240.304601] shrink_node+0x7f7/0x1df0 [ 1240.305515] balance_pgdat+0x766/0xf50 [ 1240.306657] kswapd+0x5bd/0xd20 [ 1240.307551] kthread+0x34f/0x460 [ 1240.308346] ret_from_fork+0x22/0x30 [ 1240.309247] [ 1240.309247] other info that might help us debug this: [ 1240.309247] [ 1240.310944] Possible unsafe locking scenario: [ 1240.310944] [ 1240.312379] CPU0 CPU1 [ 1240.313363] ---- ---- [ 1240.314433] lock(fs_reclaim); [ 1240.315107] lock((work_completion)(&gc->work)); [ 1240.316828] lock(fs_reclaim); [ 1240.318088] lock((work_completion)(&gc->work)); [ 1240.319203] [ 1240.319203] *** DEADLOCK *** ... [ 2438.431081] Workqueue: xfs-inodegc/sda xfs_inodegc_worker [ 2438.432089] Call Trace: [ 2438.432562] __schedule+0xa94/0x1d20 [ 2438.435787] schedule+0xbf/0x270 [ 2438.436397] schedule_timeout+0x6f8/0x8b0 [ 2438.445126] wait_for_completion+0x163/0x260 [ 2438.448610] __flush_work+0x4c4/0xa40 [ 2438.455011] xfs_inode_mark_reclaimable+0x6ef/0xda0 [ 2438.456695] destroy_inode+0xc6/0x1b0 [ 2438.457375] dispose_list+0xe1/0x1d0 [ 2438.458834] prune_icache_sb+0xe8/0x150 [ 2438.461181] super_cache_scan+0x2b3/0x470 [ 2438.461950] do_shrink_slab+0x3cf/0xa50 [ 2438.462687] shrink_slab+0x17d/0x660 [ 2438.466392] shrink_node+0x87e/0x1d40 [ 2438.467894] do_try_to_free_pages+0x364/0x1300 [ 2438.471188] try_to_free_pages+0x26c/0x5b0 [ 2438.473567] __alloc_pages_slowpath.constprop.136+0x7aa/0x2100 [ 2438.482577] __alloc_pages+0x5db/0x710 [ 2438.485231] alloc_pages+0x100/0x200 [ 2438.485923] allocate_slab+0x2c0/0x380 [ 2438.486623] ___slab_alloc+0x41f/0x690 [ 2438.490254] __slab_alloc+0x54/0x70 [ 2438.491692] kmem_cache_alloc+0x23e/0x270 [ 2438.492437] xfs_trans_alloc+0x88/0x880 [ 2438.493168] xfs_inactive_ifree+0xe2/0x4e0 [ 2438.496419] xfs_inactive+0x4eb/0x8b0 [ 2438.497123] xfs_inodegc_worker+0x16b/0x5e0 [ 2438.497918] process_one_work+0xbf7/0x1a20 [ 2438.500316] worker_thread+0x8c/0x1060 [ 2438.504938] ret_from_fork+0x22/0x30
When the memory is insufficient, xfs_inonodegc_worker will trigger memory reclamation when memory is allocated, then flush_work() may be called to wait for the work to complete. This causes a deadlock.
So use memalloc_nofs_save() to avoid triggering memory reclamation in xfs_inodegc_worker.
Signed-off-by: Wu Guanghao wuguanghao3@huawei.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Guo Xuenan guoxuenan@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- fs/xfs/xfs_icache.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 2039423df384..019e5f019468 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1879,12 +1879,20 @@ xfs_inodegc_worker( work); struct llist_node *node = llist_del_all(&gc->list); struct xfs_inode *ip, *n; + unsigned int nofs_flag;
WRITE_ONCE(gc->items, 0);
if (!node) return;
+ /* + * We can allocate memory here while doing writeback on behalf of + * memory reclaim. To avoid memory allocation deadlocks set the + * task-wide nofs context for the following operations. + */ + nofs_flag = memalloc_nofs_save(); + ip = llist_entry(node, struct xfs_inode, i_gclist); trace_xfs_inodegc_worker(ip->i_mount, READ_ONCE(gc->shrinker_hits));
@@ -1893,6 +1901,8 @@ xfs_inodegc_worker( xfs_iflags_set(ip, XFS_INACTIVATING); xfs_inodegc_inactivate(ip); } + + memalloc_nofs_restore(nofs_flag); }
/*
From: Pablo Neira Ayuso pablo@netfilter.org
stable inclusion from stable-v5.10.164 commit 550efeff989b041f3746118c0ddd863c39ddc1aa category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6AOP8 CVE: CVE-2023-0179
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=l...
--------------------------------
commit 696e1a48b1a1b01edad542a1ef293665864a4dd0 upstream.
If the offset + length goes over the ethernet + vlan header, then the length is adjusted to copy the bytes that are within the boundaries of the vlan_ethhdr scratchpad area. The remaining bytes beyond ethernet + vlan header are copied directly from the skbuff data area.
Fix incorrect arithmetic operator: subtract, not add, the size of the vlan header in case of double-tagged packets to adjust the length accordingly to address CVE-2023-0179.
Reported-by: Davide Ornaghi d.ornaghi97@gmail.com Fixes: f6ae9f120dad ("netfilter: nft_payload: add C-VLAN support") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/netfilter/nft_payload.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c index 551e0d6cf63d..74c220eeec1a 100644 --- a/net/netfilter/nft_payload.c +++ b/net/netfilter/nft_payload.c @@ -62,7 +62,7 @@ nft_payload_copy_vlan(u32 *d, const struct sk_buff *skb, u8 offset, u8 len) return false;
if (offset + len > VLAN_ETH_HLEN + vlan_hlen) - ethlen -= offset + len - VLAN_ETH_HLEN + vlan_hlen; + ethlen -= offset + len - VLAN_ETH_HLEN - vlan_hlen;
memcpy(dst_u8, vlanh + offset - vlan_hlen, ethlen);
From: Frederick Lawler fred@cloudflare.com
stable inclusion from stable-v5.10.163 commit 9f7bc28a6b8afc2274e25650511555e93f45470f category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6B0FA CVE: CVE-2022-47929
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
commit 96398560f26aa07e8f2969d73c8197e6a6d10407 upstream.
While experimenting with applying noqueue to a classful queue discipline, we discovered a NULL pointer dereference in the __dev_queue_xmit() path that generates a kernel OOPS:
# dev=enp0s5 # tc qdisc replace dev $dev root handle 1: htb default 1 # tc class add dev $dev parent 1: classid 1:1 htb rate 10mbit # tc qdisc add dev $dev parent 1:1 handle 10: noqueue # ping -I $dev -w 1 -c 1 1.1.1.1
[ 2.172856] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 2.173217] #PF: supervisor instruction fetch in kernel mode ... [ 2.178451] Call Trace: [ 2.178577] <TASK> [ 2.178686] htb_enqueue+0x1c8/0x370 [ 2.178880] dev_qdisc_enqueue+0x15/0x90 [ 2.179093] __dev_queue_xmit+0x798/0xd00 [ 2.179305] ? _raw_write_lock_bh+0xe/0x30 [ 2.179522] ? __local_bh_enable_ip+0x32/0x70 [ 2.179759] ? ___neigh_create+0x610/0x840 [ 2.179968] ? eth_header+0x21/0xc0 [ 2.180144] ip_finish_output2+0x15e/0x4f0 [ 2.180348] ? dst_output+0x30/0x30 [ 2.180525] ip_push_pending_frames+0x9d/0xb0 [ 2.180739] raw_sendmsg+0x601/0xcb0 [ 2.180916] ? _raw_spin_trylock+0xe/0x50 [ 2.181112] ? _raw_spin_unlock_irqrestore+0x16/0x30 [ 2.181354] ? get_page_from_freelist+0xcd6/0xdf0 [ 2.181594] ? sock_sendmsg+0x56/0x60 [ 2.181781] sock_sendmsg+0x56/0x60 [ 2.181958] __sys_sendto+0xf7/0x160 [ 2.182139] ? handle_mm_fault+0x6e/0x1d0 [ 2.182366] ? do_user_addr_fault+0x1e1/0x660 [ 2.182627] __x64_sys_sendto+0x1b/0x30 [ 2.182881] do_syscall_64+0x38/0x90 [ 2.183085] entry_SYSCALL_64_after_hwframe+0x63/0xcd ... [ 2.187402] </TASK>
Previously in commit d66d6c3152e8 ("net: sched: register noqueue qdisc"), NULL was set for the noqueue discipline on noqueue init so that __dev_queue_xmit() falls through for the noqueue case. This also sets a bypass of the enqueue NULL check in the register_qdisc() function for the struct noqueue_disc_ops.
Classful queue disciplines make it past the NULL check in __dev_queue_xmit() because the discipline is set to htb (in this case), and then in the call to __dev_xmit_skb(), it calls into htb_enqueue() which grabs a leaf node for a class and then calls qdisc_enqueue() by passing in a queue discipline which assumes ->enqueue() is not set to NULL.
Fix this by not allowing classes to be assigned to the noqueue discipline. Linux TC Notes states that classes cannot be set to the noqueue discipline. [1] Let's enforce that here.
Links: 1. https://linux-tc-notes.sourceforge.net/tc/doc/sch_noqueue.txt
Fixes: d66d6c3152e8 ("net: sched: register noqueue qdisc") Cc: stable@vger.kernel.org Signed-off-by: Frederick Lawler fred@cloudflare.com Reviewed-by: Jakub Sitnicki jakub@cloudflare.com Link: https://lore.kernel.org/r/20230109163906.706000-1-fred@cloudflare.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Baisong Zhong zhongbaisong@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- net/sched/sch_api.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c index 6e18aa417782..01bec5e746d3 100644 --- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -1113,6 +1113,11 @@ static int qdisc_graft(struct net_device *dev, struct Qdisc *parent, return -ENOENT; }
+ if (new && new->ops == &noqueue_qdisc_ops) { + NL_SET_ERR_MSG(extack, "Cannot assign noqueue to a class"); + return -EINVAL; + } + err = cops->graft(parent, cl, new, &old, extack); if (err) return err;
From: Alistair Popple apopple@nvidia.com
mainline inclusion from mainline-v6.1-rc7 commit 4a955bed882e734807024afd8f53213d4c61ff97 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6BG56 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
The migrate_to_ram() callback should always succeed, but in rare cases can fail usually returning VM_FAULT_SIGBUS. Commit 16ce101db85d ("mm/memory.c: fix race when faulting a device private page") incorrectly stopped passing the return code up the stack. Fix this by setting the ret variable, restoring the previous behaviour on migrate_to_ram() failure.
Link: https://lkml.kernel.org/r/20221114115537.727371-1-apopple@nvidia.com Fixes: 16ce101db85d ("mm/memory.c: fix race when faulting a device private page") Signed-off-by: Alistair Popple apopple@nvidia.com Acked-by: David Hildenbrand david@redhat.com Reviewed-by: Felix Kuehling Felix.Kuehling@amd.com Cc: Ralph Campbell rcampbell@nvidia.com Cc: John Hubbard jhubbard@nvidia.com Cc: Alex Sierra alex.sierra@amd.com Cc: Ben Skeggs bskeggs@redhat.com Cc: Lyude Paul lyude@redhat.com Cc: Jason Gunthorpe jgg@nvidia.com Cc: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Ma Wupeng mawupeng1@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Reviewed-by: tong tiangen tongtiangen@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com --- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/memory.c b/mm/memory.c index badf913062a2..732895edbb67 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3415,7 +3415,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) */ get_page(vmf->page); pte_unmap_unlock(vmf->pte, vmf->ptl); - vmf->page->pgmap->ops->migrate_to_ram(vmf); + ret = vmf->page->pgmap->ops->migrate_to_ram(vmf); put_page(vmf->page); } else if (is_hwpoison_entry(entry)) { ret = VM_FAULT_HWPOISON;