Alexander Potapenko (2): mm: fs: initialize fsdata passed to write_begin/write_end interface ipv6: addrlabel: fix infoleak when sending struct ifaddrlblmsg to network
Arnd Bergmann (1): security: commoncap: fix -Wstringop-overread warning
Baisong Zhong (1): bpf, test_run: Fix alignment problem in bpf_prog_test_run_skb()
Chen Zhongjin (1): net, neigh: Fix null-ptr-deref in neigh_table_clear()
Christian A. Ehrhardt (1): kernfs: fix use-after-free in __kernfs_remove
Chuang Wang (1): net: macvlan: fix memory leaks of macvlan_common_newlink
Daniil Tatianin (1): ring_buffer: Do not deactivate non-existant pages
Eric Dumazet (1): macvlan: enforce a consistent minimal mtu
Gaosheng Cui (1): capabilities: fix potential memleak on error path from vfs_getxattr_alloc()
Ilpo Järvinen (2): serial: 8250: Fall back to non-DMA Rx if IIR_RDI occurs serial: 8250: Flush DMA Rx on RLSI
Jiri Benc (1): net: gso: fix panic on frag_list with mixed head alloc types
Kuniyuki Iwashima (1): tcp/udp: Make early_demux back namespacified.
Li Qiang (1): kprobe: reverse kp->flags when arm_kprobe failed
Neal Cardwell (1): tcp: fix indefinite deferral of RTO with SACK reneging
Rik van Riel (1): mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages
Seth Jenkins (1): mm: /proc/pid/smaps_rollup: fix no vma's null-deref
Wang Wensheng (1): ftrace: Optimize the allocation for mcount entries
Xia Fukun (2): ida: don't use BUG_ON() for debugging signal handling: don't use BUG_ON() for debugging
Xiu Jianfeng (1): ftrace: Fix null pointer dereference in ftrace_add_mod()
Zhang Xiaoxu (1): nfs4: Fix kmemleak when allocate slot failed
Zhengchao Shao (2): net: fix UAF issue in nfqnl_nf_hook_drop() when ops_init() failed ipv6: fix WARNING in ip6_route_net_exit_late()
drivers/net/macvlan.c | 6 ++- drivers/tty/serial/8250/8250_port.c | 7 +++- fs/buffer.c | 4 +- fs/kernfs/dir.c | 5 ++- fs/namei.c | 2 +- fs/nfs/nfs4client.c | 1 + fs/proc/task_mmu.c | 2 +- include/net/protocol.h | 4 -- include/net/tcp.h | 2 + include/net/udp.h | 1 + kernel/kprobes.c | 5 ++- kernel/signal.c | 8 ++-- kernel/trace/ftrace.c | 3 +- kernel/trace/ring_buffer.c | 4 +- lib/idr.c | 4 +- mm/filemap.c | 2 +- mm/hugetlb.c | 2 +- net/bpf/test_run.c | 1 + net/core/neighbour.c | 2 +- net/core/net_namespace.c | 7 ++++ net/core/skbuff.c | 36 +++++++++--------- net/ipv4/af_inet.c | 14 +------ net/ipv4/ip_input.c | 32 ++++++++++------ net/ipv4/sysctl_net_ipv4.c | 59 +---------------------------- net/ipv4/tcp_input.c | 3 +- net/ipv6/addrlabel.c | 1 + net/ipv6/ip6_input.c | 23 +++++++---- net/ipv6/route.c | 14 +++++-- net/ipv6/tcp_ipv6.c | 9 +---- net/ipv6/udp.c | 9 +---- security/commoncap.c | 6 ++- 31 files changed, 127 insertions(+), 151 deletions(-)
From: Xia Fukun xiafukun@huawei.com
stable inclusion from stable-v4.19.267 commit 33d2f83e3f2c1fdabb365d25bed3aa630041cbc0 category: bugfix bugzilla: 188002, https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
commit fc82bbf4dede758007763867d0282353c06d1121 upstream.
This is another old BUG_ON() that just shouldn't exist (see also commit a382f8fee42c: "signal handling: don't use BUG_ON() for debugging").
In fact, as Matthew Wilcox points out, this condition shouldn't really even result in a warning, since a negative id allocation result is just a normal allocation failure:
"I wonder if we should even warn here -- sure, the caller is trying to free something that wasn't allocated, but we don't warn for kfree(NULL)"
and goes on to point out how that current error check is only causing people to unnecessarily do their own index range checking before freeing it.
This was noted by Itay Iellin, because the bluetooth HCI socket cookie code does *not* do that range checking, and ends up just freeing the error case too, triggering the BUG_ON().
The HCI code requires CAP_NET_RAW, and seems to just result in an ugly splat, but there really is no reason to BUG_ON() here, and we have generally striven for allocation models where it's always ok to just do
free(alloc());
even if the allocation were to fail for some random reason (usually obviously that "random" reason being some resource limit).
Fixes: 88eca0207cf1 ("ida: simplified functions for id allocation") Reported-by: Itay Iellin ieitayie@gmail.com Suggested-by: Matthew Wilcox willy@infradead.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Xia Fukun xiafukun@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- lib/idr.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/lib/idr.c b/lib/idr.c index 6ff3b1c36e0a..82c24a417dc6 100644 --- a/lib/idr.c +++ b/lib/idr.c @@ -573,7 +573,9 @@ void ida_free(struct ida *ida, unsigned int id) { unsigned long flags;
- BUG_ON((int)id < 0); + if ((int)id < 0) + return; + xa_lock_irqsave(&ida->ida_rt, flags); ida_remove(ida, id); xa_unlock_irqrestore(&ida->ida_rt, flags);
From: Xia Fukun xiafukun@huawei.com
stable inclusion from stable-v4.19.267 commit 93d9cef55f8fe463e3b9f6c73c7a32619222c657 category: bugfix bugzilla: 187828, https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
[ Upstream commit a382f8fee42ca10c9bfce0d2352d4153f931f5dc ]
These are indeed "should not happen" situations, but it turns out recent changes made the 'task_is_stopped_or_trace()' case trigger (fix for that exists, is pending more testing), and the BUG_ON() makes it unnecessarily hard to actually debug for no good reason.
It's been that way for a long time, but let's make it clear: BUG_ON() is not good for debugging, and should never be used in situations where you could just say "this shouldn't happen, but we can continue".
Use WARN_ON_ONCE() instead to make sure it gets logged, and then just continue running. Instead of making the system basically unusuable because you crashed the machine while potentially holding some very core locks (eg this function is commonly called while holding 'tasklist_lock' for writing).
Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Xia Fukun xiafukun@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/signal.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c index d8f810e9fc34..69b9d8bff5a8 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1831,12 +1831,12 @@ bool do_notify_parent(struct task_struct *tsk, int sig) bool autoreap = false; u64 utime, stime;
- BUG_ON(sig == -1); + WARN_ON_ONCE(sig == -1);
- /* do_notify_parent_cldstop should have been called instead. */ - BUG_ON(task_is_stopped_or_traced(tsk)); + /* do_notify_parent_cldstop should have been called instead. */ + WARN_ON_ONCE(task_is_stopped_or_traced(tsk));
- BUG_ON(!tsk->ptrace && + WARN_ON_ONCE(!tsk->ptrace && (tsk->group_leader != tsk || !thread_group_empty(tsk)));
if (sig != SIGCHLD) {
From: Seth Jenkins sethjenkins@google.com
stable inclusion from stable-v4.19.264 commit dbe863bce7679c7f5ec0e993d834fe16c5e687b5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
Commit 258f669e7e88 ("mm: /proc/pid/smaps_rollup: convert to single value seq_file") introduced a null-deref if there are no vma's in the task in show_smaps_rollup.
Fixes: 258f669e7e88 ("mm: /proc/pid/smaps_rollup: convert to single value seq_file") Signed-off-by: Seth Jenkins sethjenkins@google.com Reviewed-by: Alexey Dobriyan adobriyan@gmail.com Tested-by: Alexey Dobriyan adobriyan@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/proc/task_mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 78ce353d0dfa..cfe2c4e32533 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -851,7 +851,7 @@ static int show_smaps_rollup(struct seq_file *m, void *v) last_vma_end = vma->vm_end; }
- show_vma_header_prefix(m, priv->mm->mmap->vm_start, + show_vma_header_prefix(m, priv->mm->mmap ? priv->mm->mmap->vm_start : 0, last_vma_end, 0, 0, 0, 0); seq_pad(m, ' '); seq_puts(m, "[rollup]\n");
From: Rik van Riel riel@surriel.com
stable inclusion from stable-v4.19.264 commit 2b35432d324898ec41beb27031d2a1a864a4d40e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
commit 12df140f0bdfae5dcfc81800970dd7f6f632e00c upstream.
The h->*_huge_pages counters are protected by the hugetlb_lock, but alloc_huge_page has a corner case where it can decrement the counter outside of the lock.
This could lead to a corrupted value of h->resv_huge_pages, which we have observed on our systems.
Take the hugetlb_lock before decrementing h->resv_huge_pages to avoid a potential race.
Link: https://lkml.kernel.org/r/20221017202505.0e6a4fcd@imladris.surriel.com Fixes: a88c76954804 ("mm: hugetlb: fix hugepage memory leak caused by wrong reserve count") Signed-off-by: Rik van Riel riel@surriel.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Cc: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Cc: Glen McCready gkmccready@meta.com Cc: Mike Kravetz mike.kravetz@oracle.com Cc: Muchun Song songmuchun@bytedance.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- mm/hugetlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e8ef6c62da34..fb1f90eed7fc 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2271,11 +2271,11 @@ struct page *alloc_huge_page(struct vm_area_struct *vma, page = alloc_buddy_huge_page_with_mpol(h, vma, addr); if (!page) goto out_uncharge_cgroup; + spin_lock(&hugetlb_lock); if (!avoid_reserve && vma_has_reserves(vma, gbl_chg)) { SetPagePrivate(page); h->resv_huge_pages--; } - spin_lock(&hugetlb_lock); list_move(&page->lru, &h->hugepage_activelist); /* Fall through */ }
From: "Christian A. Ehrhardt" lk@c--e.de
stable inclusion from stable-v4.19.264 commit 028cf780743eea79abffa7206b9dcfc080ad3546 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
commit 4abc99652812a2ddf932f137515d5c5a04723538 upstream.
Syzkaller managed to trigger concurrent calls to kernfs_remove_by_name_ns() for the same file resulting in a KASAN detected use-after-free. The race occurs when the root node is freed during kernfs_drain().
To prevent this acquire an additional reference for the root of the tree that is removed before calling __kernfs_remove().
Found by syzkaller with the following reproducer (slab_nomerge is required):
syz_mount_image$ext4(0x0, &(0x7f0000000100)='./file0\x00', 0x100000, 0x0, 0x0, 0x0, 0x0) r0 = openat(0xffffffffffffff9c, &(0x7f0000000080)='/proc/self/exe\x00', 0x0, 0x0) close(r0) pipe2(&(0x7f0000000140)={0xffffffffffffffff, <r1=>0xffffffffffffffff}, 0x800) mount$9p_fd(0x0, &(0x7f0000000040)='./file0\x00', &(0x7f00000000c0), 0x408, &(0x7f0000000280)={'trans=fd,', {'rfdno', 0x3d, r0}, 0x2c, {'wfdno', 0x3d, r1}, 0x2c, {[{@cache_loose}, {@mmap}, {@loose}, {@loose}, {@mmap}], [{@mask={'mask', 0x3d, '^MAY_EXEC'}}, {@fsmagic={'fsmagic', 0x3d, 0x10001}}, {@dont_hash}]}})
Sample report:
================================================================== BUG: KASAN: use-after-free in kernfs_type include/linux/kernfs.h:335 [inline] BUG: KASAN: use-after-free in kernfs_leftmost_descendant fs/kernfs/dir.c:1261 [inline] BUG: KASAN: use-after-free in __kernfs_remove.part.0+0x843/0x960 fs/kernfs/dir.c:1369 Read of size 2 at addr ffff8880088807f0 by task syz-executor.2/857
CPU: 0 PID: 857 Comm: syz-executor.2 Not tainted 6.0.0-rc3-00363-g7726d4c3e60b #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x6e/0x91 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:317 [inline] print_report.cold+0x5e/0x5e5 mm/kasan/report.c:433 kasan_report+0xa3/0x130 mm/kasan/report.c:495 kernfs_type include/linux/kernfs.h:335 [inline] kernfs_leftmost_descendant fs/kernfs/dir.c:1261 [inline] __kernfs_remove.part.0+0x843/0x960 fs/kernfs/dir.c:1369 __kernfs_remove fs/kernfs/dir.c:1356 [inline] kernfs_remove_by_name_ns+0x108/0x190 fs/kernfs/dir.c:1589 sysfs_slab_add+0x133/0x1e0 mm/slub.c:5943 __kmem_cache_create+0x3e0/0x550 mm/slub.c:4899 create_cache mm/slab_common.c:229 [inline] kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335 p9_client_create+0xd4d/0x1190 net/9p/client.c:993 v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408 v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126 legacy_get_tree+0xf1/0x200 fs/fs_context.c:610 vfs_get_tree+0x85/0x2e0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x675/0x1d00 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x282/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f725f983aed Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f725f0f7028 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 00007f725faa3f80 RCX: 00007f725f983aed RDX: 00000000200000c0 RSI: 0000000020000040 RDI: 0000000000000000 RBP: 00007f725f9f419c R08: 0000000020000280 R09: 0000000000000000 R10: 0000000000000408 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000006 R14: 00007f725faa3f80 R15: 00007f725f0d7000 </TASK>
Allocated by task 855: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:45 [inline] set_alloc_info mm/kasan/common.c:437 [inline] __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:470 kasan_slab_alloc include/linux/kasan.h:224 [inline] slab_post_alloc_hook mm/slab.h:727 [inline] slab_alloc_node mm/slub.c:3243 [inline] slab_alloc mm/slub.c:3251 [inline] __kmem_cache_alloc_lru mm/slub.c:3258 [inline] kmem_cache_alloc+0xbf/0x200 mm/slub.c:3268 kmem_cache_zalloc include/linux/slab.h:723 [inline] __kernfs_new_node+0xd4/0x680 fs/kernfs/dir.c:593 kernfs_new_node fs/kernfs/dir.c:655 [inline] kernfs_create_dir_ns+0x9c/0x220 fs/kernfs/dir.c:1010 sysfs_create_dir_ns+0x127/0x290 fs/sysfs/dir.c:59 create_dir lib/kobject.c:63 [inline] kobject_add_internal+0x24a/0x8d0 lib/kobject.c:223 kobject_add_varg lib/kobject.c:358 [inline] kobject_init_and_add+0x101/0x160 lib/kobject.c:441 sysfs_slab_add+0x156/0x1e0 mm/slub.c:5954 __kmem_cache_create+0x3e0/0x550 mm/slub.c:4899 create_cache mm/slab_common.c:229 [inline] kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335 p9_client_create+0xd4d/0x1190 net/9p/client.c:993 v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408 v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126 legacy_get_tree+0xf1/0x200 fs/fs_context.c:610 vfs_get_tree+0x85/0x2e0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x675/0x1d00 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x282/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 857: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 kasan_set_track+0x21/0x30 mm/kasan/common.c:45 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:370 ____kasan_slab_free mm/kasan/common.c:367 [inline] ____kasan_slab_free mm/kasan/common.c:329 [inline] __kasan_slab_free+0x108/0x190 mm/kasan/common.c:375 kasan_slab_free include/linux/kasan.h:200 [inline] slab_free_hook mm/slub.c:1754 [inline] slab_free_freelist_hook mm/slub.c:1780 [inline] slab_free mm/slub.c:3534 [inline] kmem_cache_free+0x9c/0x340 mm/slub.c:3551 kernfs_put.part.0+0x2b2/0x520 fs/kernfs/dir.c:547 kernfs_put+0x42/0x50 fs/kernfs/dir.c:521 __kernfs_remove.part.0+0x72d/0x960 fs/kernfs/dir.c:1407 __kernfs_remove fs/kernfs/dir.c:1356 [inline] kernfs_remove_by_name_ns+0x108/0x190 fs/kernfs/dir.c:1589 sysfs_slab_add+0x133/0x1e0 mm/slub.c:5943 __kmem_cache_create+0x3e0/0x550 mm/slub.c:4899 create_cache mm/slab_common.c:229 [inline] kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335 p9_client_create+0xd4d/0x1190 net/9p/client.c:993 v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408 v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126 legacy_get_tree+0xf1/0x200 fs/fs_context.c:610 vfs_get_tree+0x85/0x2e0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x675/0x1d00 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x282/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
The buggy address belongs to the object at ffff888008880780 which belongs to the cache kernfs_node_cache of size 128 The buggy address is located 112 bytes inside of 128-byte region [ffff888008880780, ffff888008880800)
The buggy address belongs to the physical page: page:00000000732833f8 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8880 flags: 0x100000000000200(slab|node=0|zone=1) raw: 0100000000000200 0000000000000000 dead000000000122 ffff888001147280 raw: 0000000000000000 0000000000150015 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff888008880680: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb ffff888008880700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff888008880780: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff888008880800: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb ffff888008880880: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ==================================================================
Acked-by: Tejun Heo tj@kernel.org Cc: stable stable@kernel.org # -rc3 Signed-off-by: Christian A. Ehrhardt lk@c--e.de Link: https://lore.kernel.org/r/20220913121723.691454-1-lk@c--e.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/kernfs/dir.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c index 7fa940399144..7a93449a589d 100644 --- a/fs/kernfs/dir.c +++ b/fs/kernfs/dir.c @@ -1505,8 +1505,11 @@ int kernfs_remove_by_name_ns(struct kernfs_node *parent, const char *name, mutex_lock(&kernfs_mutex);
kn = kernfs_find_ns(parent, name, ns); - if (kn) + if (kn) { + kernfs_get(kn); __kernfs_remove(kn); + kernfs_put(kn); + }
mutex_unlock(&kernfs_mutex);
From: Zhang Xiaoxu zhangxiaoxu5@huawei.com
stable inclusion from stable-v4.19.265 commit 86ce0e93cf6fb4d0c447323ac66577c642628b9d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
[ Upstream commit 7e8436728e22181c3f12a5dbabd35ed3a8b8c593 ]
If one of the slot allocate failed, should cleanup all the other allocated slots, otherwise, the allocated slots will leak:
unreferenced object 0xffff8881115aa100 (size 64): comm ""mount.nfs"", pid 679, jiffies 4294744957 (age 115.037s) hex dump (first 32 bytes): 00 cc 19 73 81 88 ff ff 00 a0 5a 11 81 88 ff ff ...s......Z..... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000007a4c434a>] nfs4_find_or_create_slot+0x8e/0x130 [<000000005472a39c>] nfs4_realloc_slot_table+0x23f/0x270 [<00000000cd8ca0eb>] nfs40_init_client+0x4a/0x90 [<00000000128486db>] nfs4_init_client+0xce/0x270 [<000000008d2cacad>] nfs4_set_client+0x1a2/0x2b0 [<000000000e593b52>] nfs4_create_server+0x300/0x5f0 [<00000000e4425dd2>] nfs4_try_get_tree+0x65/0x110 [<00000000d3a6176f>] vfs_get_tree+0x41/0xf0 [<0000000016b5ad4c>] path_mount+0x9b3/0xdd0 [<00000000494cae71>] __x64_sys_mount+0x190/0x1d0 [<000000005d56bdec>] do_syscall_64+0x35/0x80 [<00000000687c9ae4>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fixes: abf79bb341bf ("NFS: Add a slot table to struct nfs_client for NFSv4.0 transport blocking") Signed-off-by: Zhang Xiaoxu zhangxiaoxu5@huawei.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/nfs/nfs4client.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c index 60999b261666..1350ea673672 100644 --- a/fs/nfs/nfs4client.c +++ b/fs/nfs/nfs4client.c @@ -340,6 +340,7 @@ int nfs40_init_client(struct nfs_client *clp) ret = nfs4_setup_slot_table(tbl, NFS4_MAX_SLOT_TABLE, "NFSv4.0 transport Slot table"); if (ret) { + nfs4_shutdown_slot_table(tbl); kfree(tbl); return ret; }
From: Alexander Potapenko glider@google.com
stable inclusion from stable-v4.19.267 commit 8a5be2948f350d34b1f6acb9ca3be4c89359a057 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
commit 1468c6f4558b1bcd92aa0400f2920f9dc7588402 upstream.
Functions implementing the a_ops->write_end() interface accept the `void *fsdata` parameter that is supposed to be initialized by the corresponding a_ops->write_begin() (which accepts `void **fsdata`).
However not all a_ops->write_begin() implementations initialize `fsdata` unconditionally, so it may get passed uninitialized to a_ops->write_end(), resulting in undefined behavior.
Fix this by initializing fsdata with NULL before the call to write_begin(), rather than doing so in all possible a_ops implementations.
This patch covers only the following cases found by running x86 KMSAN under syzkaller:
- generic_perform_write() - cont_expand_zero() and generic_cont_expand_simple() - page_symlink()
Other cases of passing uninitialized fsdata may persist in the codebase.
Link: https://lkml.kernel.org/r/20220915150417.722975-43-glider@google.com Signed-off-by: Alexander Potapenko glider@google.com Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Alexei Starovoitov ast@kernel.org Cc: Andrey Konovalov andreyknvl@gmail.com Cc: Andrey Konovalov andreyknvl@google.com Cc: Andy Lutomirski luto@kernel.org Cc: Arnd Bergmann arnd@arndb.de Cc: Borislav Petkov bp@alien8.de Cc: Christoph Hellwig hch@lst.de Cc: Christoph Lameter cl@linux.com Cc: David Rientjes rientjes@google.com Cc: Dmitry Vyukov dvyukov@google.com Cc: Eric Biggers ebiggers@google.com Cc: Eric Biggers ebiggers@kernel.org Cc: Eric Dumazet edumazet@google.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Herbert Xu herbert@gondor.apana.org.au Cc: Ilya Leoshkevich iii@linux.ibm.com Cc: Ingo Molnar mingo@redhat.com Cc: Jens Axboe axboe@kernel.dk Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Kees Cook keescook@chromium.org Cc: Marco Elver elver@google.com Cc: Mark Rutland mark.rutland@arm.com Cc: Matthew Wilcox willy@infradead.org Cc: Michael S. Tsirkin mst@redhat.com Cc: Pekka Enberg penberg@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Petr Mladek pmladek@suse.com Cc: Stephen Rothwell sfr@canb.auug.org.au Cc: Steven Rostedt rostedt@goodmis.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Vasily Gorbik gor@linux.ibm.com Cc: Vegard Nossum vegard.nossum@oracle.com Cc: Vlastimil Babka vbabka@suse.cz Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/buffer.c | 4 ++-- fs/namei.c | 2 +- mm/filemap.c | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c index 2a213e8bb97c..ea8a7b6efdf5 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -2322,7 +2322,7 @@ int generic_cont_expand_simple(struct inode *inode, loff_t size) { struct address_space *mapping = inode->i_mapping; struct page *page; - void *fsdata; + void *fsdata = NULL; int err;
err = inode_newsize_ok(inode, size); @@ -2348,7 +2348,7 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping, struct inode *inode = mapping->host; unsigned int blocksize = i_blocksize(inode); struct page *page; - void *fsdata; + void *fsdata = NULL; pgoff_t index, curidx; loff_t curpos; unsigned zerofrom, offset, len; diff --git a/fs/namei.c b/fs/namei.c index fb87793e9adf..6144c6434280 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -4833,7 +4833,7 @@ int __page_symlink(struct inode *inode, const char *symname, int len, int nofs) { struct address_space *mapping = inode->i_mapping; struct page *page; - void *fsdata; + void *fsdata = NULL; int err; unsigned int flags = 0; if (nofs) diff --git a/mm/filemap.c b/mm/filemap.c index 9b6e72e14a04..f33504b784a2 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3250,7 +3250,7 @@ ssize_t generic_perform_write(struct file *file, unsigned long offset; /* Offset into pagecache page */ unsigned long bytes; /* Bytes to write to page */ size_t copied; /* Bytes copied from user */ - void *fsdata; + void *fsdata = NULL;
offset = (pos & (PAGE_SIZE - 1)); bytes = min_t(unsigned long, PAGE_SIZE - offset,
From: Li Qiang liq3ea@163.com
stable inclusion from stable-v4.19.265 commit d608ed66abfaccc233404be2583ab89c37e560fc category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
commit 4a6f316d6855a434f56dbbeba05e14c01acde8f8 upstream.
In aggregate kprobe case, when arm_kprobe failed, we need set the kp->flags with KPROBE_FLAG_DISABLED again. If not, the 'kp' kprobe will been considered as enabled but it actually not enabled.
Link: https://lore.kernel.org/all/20220902155820.34755-1-liq3ea@163.com/
Fixes: 12310e343755 ("kprobes: Propagate error from arm_kprobe_ftrace()") Cc: stable@vger.kernel.org Signed-off-by: Li Qiang liq3ea@163.com Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/kprobes.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 2be52a2d24ad..877fe4b36dfe 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -2173,8 +2173,11 @@ int enable_kprobe(struct kprobe *kp) if (!kprobes_all_disarmed && kprobe_disabled(p)) { p->flags &= ~KPROBE_FLAG_DISABLED; ret = arm_kprobe(p); - if (ret) + if (ret) { p->flags |= KPROBE_FLAG_DISABLED; + if (p != kp) + kp->flags |= KPROBE_FLAG_DISABLED; + } } out: mutex_unlock(&kprobe_mutex);
From: Wang Wensheng wangwensheng4@huawei.com
stable inclusion from stable-v4.19.267 commit d110bb57a7e9831465aa3abb6c0d1cc658b05fbe category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
commit bcea02b096333dc74af987cb9685a4dbdd820840 upstream.
If we can't allocate this size, try something smaller with half of the size. Its order should be decreased by one instead of divided by two.
Link: https://lkml.kernel.org/r/20221109094434.84046-3-wangwensheng4@huawei.com
Cc: mhiramat@kernel.org Cc: mark.rutland@arm.com Cc: stable@vger.kernel.org Fixes: a79008755497d ("ftrace: Allocate the mcount record pages as groups") Signed-off-by: Wang Wensheng wangwensheng4@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/trace/ftrace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index e9d112972c7f..e5dd0a5aa088 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -3023,7 +3023,7 @@ static int ftrace_allocate_records(struct ftrace_page *pg, int count) /* if we can't allocate this size, try something smaller */ if (!order) return -ENOMEM; - order >>= 1; + order--; goto again; }
From: Xiu Jianfeng xiujianfeng@huawei.com
stable inclusion from stable-v4.19.267 commit b5bfc61f541d3f092b13dedcfe000d86eb8e133c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
commit 19ba6c8af9382c4c05dc6a0a79af3013b9a35cd0 upstream.
The @ftrace_mod is allocated by kzalloc(), so both the members {prev,next} of @ftrace_mode->list are NULL, it's not a valid state to call list_del(). If kstrdup() for @ftrace_mod->{func|module} fails, it goes to @out_free tag and calls free_ftrace_mod() to destroy @ftrace_mod, then list_del() will write prev->next and next->prev, where null pointer dereference happens.
BUG: kernel NULL pointer dereference, address: 0000000000000008 Oops: 0002 [#1] PREEMPT SMP NOPTI Call Trace: <TASK> ftrace_mod_callback+0x20d/0x220 ? do_filp_open+0xd9/0x140 ftrace_process_regex.isra.51+0xbf/0x130 ftrace_regex_write.isra.52.part.53+0x6e/0x90 vfs_write+0xee/0x3a0 ? __audit_filter_op+0xb1/0x100 ? auditd_test_task+0x38/0x50 ksys_write+0xa5/0xe0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Kernel panic - not syncing: Fatal exception
So call INIT_LIST_HEAD() to initialize the list member to fix this issue.
Link: https://lkml.kernel.org/r/20221116015207.30858-1-xiujianfeng@huawei.com
Cc: stable@vger.kernel.org Fixes: 673feb9d76ab ("ftrace: Add :mod: caching infrastructure to trace_array") Signed-off-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/trace/ftrace.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index e5dd0a5aa088..3962c6be47f5 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -1331,6 +1331,7 @@ static int ftrace_add_mod(struct trace_array *tr, if (!ftrace_mod) return -ENOMEM;
+ INIT_LIST_HEAD(&ftrace_mod->list); ftrace_mod->func = kstrdup(func, GFP_KERNEL); ftrace_mod->module = kstrdup(module, GFP_KERNEL); ftrace_mod->enable = enable;
From: Daniil Tatianin d-tatianin@yandex-team.ru
stable inclusion from stable-v4.19.267 commit 455ea324770205525cbc13f155806a5346794339 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
commit 56f4ca0a79a9f1af98f26c54b9b89ba1f9bcc6bd upstream.
rb_head_page_deactivate() expects cpu_buffer to contain a valid list of ->pages, so verify that the list is actually present before calling it.
Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool.
Link: https://lkml.kernel.org/r/20221114143129.3534443-1-d-tatianin@yandex-team.ru
Cc: stable@vger.kernel.org Fixes: 77ae365eca895 ("ring-buffer: make lockless") Signed-off-by: Daniil Tatianin d-tatianin@yandex-team.ru Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/trace/ring_buffer.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 987d3447bf2a..599a9cf12772 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1327,9 +1327,9 @@ static void rb_free_cpu_buffer(struct ring_buffer_per_cpu *cpu_buffer)
free_buffer_page(cpu_buffer->reader_page);
- rb_head_page_deactivate(cpu_buffer); - if (head) { + rb_head_page_deactivate(cpu_buffer); + list_for_each_entry_safe(bpage, tmp, head, list) { list_del_init(&bpage->list); free_buffer_page(bpage);
From: Arnd Bergmann arnd@arndb.de
stable inclusion from stable-v4.19.191 commit 2f34dd12fd7a28888286924d74c0313532bc52d8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
commit 82e5d8cc768b0c7b03c551a9ab1f8f3f68d5f83f upstream.
gcc-11 introdces a harmless warning for cap_inode_getsecurity:
security/commoncap.c: In function ‘cap_inode_getsecurity’: security/commoncap.c:440:33: error: ‘memcpy’ reading 16 bytes from a region of size 0 [-Werror=stringop-overread] 440 | memcpy(&nscap->data, &cap->data, sizeof(__le32) * 2 * VFS_CAP_U32); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The problem here is that tmpbuf is initialized to NULL, so gcc assumes it is not accessible unless it gets set by vfs_getxattr_alloc(). This is a legitimate warning as far as I can tell, but the code is correct since it correctly handles the error when that function fails.
Add a separate NULL check to tell gcc about it as well.
Signed-off-by: Arnd Bergmann arnd@arndb.de Acked-by: Christian Brauner christian.brauner@ubuntu.com Signed-off-by: James Morris jamorris@linux.microsoft.com Cc: Andrey Zhizhikin andrey.z@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- security/commoncap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/security/commoncap.c b/security/commoncap.c index 2b221c0ef502..1fd370e8820f 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -398,7 +398,7 @@ int cap_inode_getsecurity(struct inode *inode, const char *name, void **buffer, &tmpbuf, size, GFP_NOFS); dput(dentry);
- if (ret < 0) + if (ret < 0 || !tmpbuf) return ret;
fs_ns = inode->i_sb->s_user_ns;
From: Gaosheng Cui cuigaosheng1@huawei.com
stable inclusion from stable-v4.19.265 commit 90577bcc01c4188416a47269f8433f70502abe98 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
commit 8cf0a1bc12870d148ae830a4ba88cfdf0e879cee upstream.
In cap_inode_getsecurity(), we will use vfs_getxattr_alloc() to complete the memory allocation of tmpbuf, if we have completed the memory allocation of tmpbuf, but failed to call handler->get(...), there will be a memleak in below logic:
|-- ret = (int)vfs_getxattr_alloc(mnt_userns, ...) | /* ^^^ alloc for tmpbuf */ |-- value = krealloc(*xattr_value, error + 1, flags) | /* ^^^ alloc memory */ |-- error = handler->get(handler, ...) | /* error! */ |-- *xattr_value = value | /* xattr_value is &tmpbuf (memory leak!) */
So we will try to free(tmpbuf) after vfs_getxattr_alloc() fails to fix it.
Cc: stable@vger.kernel.org Fixes: 8db6c34f1dbc ("Introduce v3 namespaced file capabilities") Signed-off-by: Gaosheng Cui cuigaosheng1@huawei.com Acked-by: Serge Hallyn serge@hallyn.com [PM: subject line and backtrace tweaks] Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- security/commoncap.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/security/commoncap.c b/security/commoncap.c index 1fd370e8820f..ca35b508d301 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -398,8 +398,10 @@ int cap_inode_getsecurity(struct inode *inode, const char *name, void **buffer, &tmpbuf, size, GFP_NOFS); dput(dentry);
- if (ret < 0 || !tmpbuf) - return ret; + if (ret < 0 || !tmpbuf) { + size = ret; + goto out_free; + }
fs_ns = inode->i_sb->s_user_ns; cap = (struct vfs_cap_data *) tmpbuf;
From: Ilpo Järvinen ilpo.jarvinen@linux.intel.com
stable inclusion from stable-v4.19.267 commit 62cda857457c7de0922852d54d69b140bd6eeb7e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
commit a931237cbea256aff13bb403da13a97b2d1605d9 upstream.
DW UART sometimes triggers IIR_RDI during DMA Rx when IIR_RX_TIMEOUT should have been triggered instead. Since IIR_RDI has higher priority than IIR_RX_TIMEOUT, this causes the Rx to hang into interrupt loop. The problem seems to occur at least with some combinations of small-sized transfers (I've reproduced the problem on Elkhart Lake PSE UARTs).
If there's already an on-going Rx DMA and IIR_RDI triggers, fall graciously back to non-DMA Rx. That is, behave as if IIR_RX_TIMEOUT had occurred.
8250_omap already considers IIR_RDI similar to this change so its nothing unheard of.
Fixes: 75df022b5f89 ("serial: 8250_dma: Fix RX handling") Cc: stable@vger.kernel.org Co-developed-by: Srikanth Thokala srikanth.thokala@intel.com Signed-off-by: Srikanth Thokala srikanth.thokala@intel.com Co-developed-by: Aman Kumar aman.kumar@intel.com Signed-off-by: Aman Kumar aman.kumar@intel.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Link: https://lore.kernel.org/r/20221108121952.5497-2-ilpo.jarvinen@linux.intel.co... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/tty/serial/8250/8250_port.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c index d0e9dd9f9ef8..42d7799d58f7 100644 --- a/drivers/tty/serial/8250/8250_port.c +++ b/drivers/tty/serial/8250/8250_port.c @@ -1864,6 +1864,10 @@ EXPORT_SYMBOL_GPL(serial8250_modem_status); static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir) { switch (iir & 0x3f) { + case UART_IIR_RDI: + if (!up->dma->rx_running) + break; + /* fall-through */ case UART_IIR_RX_TIMEOUT: serial8250_rx_dma_flush(up); /* fall-through */
From: Ilpo Järvinen ilpo.jarvinen@linux.intel.com
stable inclusion from stable-v4.19.267 commit 40f5fa26c11bca5c947d218cc4fe6e0c64932070 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
commit 1980860e0c8299316cddaf0992dd9e1258ec9d88 upstream.
Returning true from handle_rx_dma() without flushing DMA first creates a data ordering hazard. If DMA Rx has handled any character at the point when RLSI occurs, the non-DMA path handles any pending characters jumping them ahead of those characters that are pending under DMA.
Fixes: 75df022b5f89 ("serial: 8250_dma: Fix RX handling") Cc: stable@vger.kernel.org Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Link: https://lore.kernel.org/r/20221108121952.5497-5-ilpo.jarvinen@linux.intel.co... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/tty/serial/8250/8250_port.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c index 42d7799d58f7..238900b1f995 100644 --- a/drivers/tty/serial/8250/8250_port.c +++ b/drivers/tty/serial/8250/8250_port.c @@ -1868,10 +1868,9 @@ static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir) if (!up->dma->rx_running) break; /* fall-through */ + case UART_IIR_RLSI: case UART_IIR_RX_TIMEOUT: serial8250_rx_dma_flush(up); - /* fall-through */ - case UART_IIR_RLSI: return true; } return up->dma->rx_dma(up);
From: Zhengchao Shao shaozhengchao@huawei.com
stable inclusion from stable-v4.19.264 commit 5a2ea549be94924364f6911227d99be86e8cf34a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
[ Upstream commit d266935ac43d57586e311a087510fe6a084af742 ]
When the ops_init() interface is invoked to initialize the net, but ops->init() fails, data is released. However, the ptr pointer in net->gen is invalid. In this case, when nfqnl_nf_hook_drop() is invoked to release the net, invalid address access occurs.
The process is as follows: setup_net() ops_init() data = kzalloc(...) ---> alloc "data" net_assign_generic() ---> assign "date" to ptr in net->gen ... ops->init() ---> failed ... kfree(data); ---> ptr in net->gen is invalid ... ops_exit_list() ... nfqnl_nf_hook_drop() *q = nfnl_queue_pernet(net) ---> q is invalid
The following is the Call Trace information: BUG: KASAN: use-after-free in nfqnl_nf_hook_drop+0x264/0x280 Read of size 8 at addr ffff88810396b240 by task ip/15855 Call Trace: <TASK> dump_stack_lvl+0x8e/0xd1 print_report+0x155/0x454 kasan_report+0xba/0x1f0 nfqnl_nf_hook_drop+0x264/0x280 nf_queue_nf_hook_drop+0x8b/0x1b0 __nf_unregister_net_hook+0x1ae/0x5a0 nf_unregister_net_hooks+0xde/0x130 ops_exit_list+0xb0/0x170 setup_net+0x7ac/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 </TASK>
Allocated by task 15855: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0xa1/0xb0 __kmalloc+0x49/0xb0 ops_init+0xe7/0x410 setup_net+0x5aa/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0
Freed by task 15855: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 ____kasan_slab_free+0x155/0x1b0 slab_free_freelist_hook+0x11b/0x220 __kmem_cache_free+0xa4/0x360 ops_init+0xb9/0x410 setup_net+0x5aa/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fixes: f875bae06533 ("net: Automatically allocate per namespace data.") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/core/net_namespace.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index f4aacbb76fa5..123094869b34 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -112,6 +112,7 @@ static int net_assign_generic(struct net *net, unsigned int id, void *data)
static int ops_init(const struct pernet_operations *ops, struct net *net) { + struct net_generic *ng; int err = -ENOMEM; void *data = NULL;
@@ -130,7 +131,13 @@ static int ops_init(const struct pernet_operations *ops, struct net *net) if (!err) return 0;
+ if (ops->id && ops->size) { cleanup: + ng = rcu_dereference_protected(net->gen, + lockdep_is_held(&pernet_ops_rwsem)); + ng->ptr[*ops->id] = NULL; + } + kfree(data);
out:
From: Neal Cardwell ncardwell@google.com
stable inclusion from stable-v4.19.264 commit 633da7b30b240000b1d9b690e43848406a0d060f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
[ Upstream commit 3d2af9cce3133b3bc596a9d065c6f9d93419ccfb ]
This commit fixes a bug that can cause a TCP data sender to repeatedly defer RTOs when encountering SACK reneging.
The bug is that when we're in fast recovery in a scenario with SACK reneging, every time we get an ACK we call tcp_check_sack_reneging() and it can note the apparent SACK reneging and rearm the RTO timer for srtt/2 into the future. In some SACK reneging scenarios that can happen repeatedly until the receive window fills up, at which point the sender can't send any more, the ACKs stop arriving, and the RTO fires at srtt/2 after the last ACK. But that can take far too long (O(10 secs)), since the connection is stuck in fast recovery with a low cwnd that cannot grow beyond ssthresh, even if more bandwidth is available.
This fix changes the logic in tcp_check_sack_reneging() to only rearm the RTO timer if data is cumulatively ACKed, indicating forward progress. This avoids this kind of nearly infinite loop of RTO timer re-arming. In addition, this meets the goals of tcp_check_sack_reneging() in handling Windows TCP behavior that looks temporarily like SACK reneging but is not really.
Many thanks to Jakub Kicinski and Neil Spring, who reported this issue and provided critical packet traces that enabled root-causing this issue. Also, many thanks to Jakub Kicinski for testing this fix.
Fixes: 5ae344c949e7 ("tcp: reduce spurious retransmits due to transient SACK reneging") Reported-by: Jakub Kicinski kuba@kernel.org Reported-by: Neil Spring ntspring@fb.com Signed-off-by: Neal Cardwell ncardwell@google.com Reviewed-by: Eric Dumazet edumazet@google.com Cc: Yuchung Cheng ycheng@google.com Tested-by: Jakub Kicinski kuba@kernel.org Link: https://lore.kernel.org/r/20221021170821.1093930-1-ncardwell.kernel@gmail.co... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv4/tcp_input.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 9cfd25e20be9..6272d5a44338 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2039,7 +2039,8 @@ void tcp_enter_loss(struct sock *sk) */ static bool tcp_check_sack_reneging(struct sock *sk, int flag) { - if (flag & FLAG_SACK_RENEGING) { + if (flag & FLAG_SACK_RENEGING && + flag & FLAG_SND_UNA_ADVANCED) { struct tcp_sock *tp = tcp_sk(sk); unsigned long delay = max(usecs_to_jiffies(tp->srtt_us >> 4), msecs_to_jiffies(10));
From: Chen Zhongjin chenzhongjin@huawei.com
stable inclusion from stable-v4.19.265 commit b736592de2aa53aee2d48d6b129bc0c892007bbe category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
[ Upstream commit f8017317cb0b279b8ab98b0f3901a2e0ac880dad ]
When IPv6 module gets initialized but hits an error in the middle, kenel panic with:
KASAN: null-ptr-deref in range [0x0000000000000598-0x000000000000059f] CPU: 1 PID: 361 Comm: insmod Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:__neigh_ifdown.isra.0+0x24b/0x370 RSP: 0018:ffff888012677908 EFLAGS: 00000202 ... Call Trace: <TASK> neigh_table_clear+0x94/0x2d0 ndisc_cleanup+0x27/0x40 [ipv6] inet6_init+0x21c/0x2cb [ipv6] do_one_initcall+0xd3/0x4d0 do_init_module+0x1ae/0x670 ... Kernel panic - not syncing: Fatal exception
When ipv6 initialization fails, it will try to cleanup and calls:
neigh_table_clear() neigh_ifdown(tbl, NULL) pneigh_queue_purge(&tbl->proxy_queue, dev_net(dev == NULL)) # dev_net(NULL) triggers null-ptr-deref.
Fix it by passing NULL to pneigh_queue_purge() in neigh_ifdown() if dev is NULL, to make kernel not panic immediately.
Fixes: 66ba215cb513 ("neigh: fix possible DoS due to net iface start/stop loop") Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: Denis V. Lunev den@openvz.org Link: https://lore.kernel.org/r/20221101121552.21890-1-chenzhongjin@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/core/neighbour.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 73042407eb5b..2b96e9a7fc59 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -312,7 +312,7 @@ int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev) write_lock_bh(&tbl->lock); neigh_flush_dev(tbl, dev); pneigh_ifdown_and_unlock(tbl, dev); - pneigh_queue_purge(&tbl->proxy_queue, dev_net(dev)); + pneigh_queue_purge(&tbl->proxy_queue, dev ? dev_net(dev) : NULL); if (skb_queue_empty_lockless(&tbl->proxy_queue)) del_timer_sync(&tbl->proxy_timer); return 0;
From: Zhengchao Shao shaozhengchao@huawei.com
stable inclusion from stable-v4.19.265 commit 83fbf246ced54dadd7b9adc2a16efeff30ba944d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
[ Upstream commit 768b3c745fe5789f2430bdab02f35a9ad1148d97 ]
During the initialization of ip6_route_net_init_late(), if file ipv6_route or rt6_stats fails to be created, the initialization is successful by default. Therefore, the ipv6_route or rt6_stats file doesn't be found during the remove in ip6_route_net_exit_late(). It will cause WRNING.
The following is the stack information: name 'rt6_stats' WARNING: CPU: 0 PID: 9 at fs/proc/generic.c:712 remove_proc_entry+0x389/0x460 Modules linked in: Workqueue: netns cleanup_net RIP: 0010:remove_proc_entry+0x389/0x460 PKRU: 55555554 Call Trace: <TASK> ops_exit_list+0xb0/0x170 cleanup_net+0x4ea/0xb00 process_one_work+0x9bf/0x1710 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30 </TASK>
Fixes: cdb1876192db ("[NETNS][IPV6] route6 - create route6 proc files for the namespace") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20221102020610.351330-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv6/route.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 31c285d0f8c3..7b2ed1fb747e 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -5392,10 +5392,16 @@ static void __net_exit ip6_route_net_exit(struct net *net) static int __net_init ip6_route_net_init_late(struct net *net) { #ifdef CONFIG_PROC_FS - proc_create_net("ipv6_route", 0, net->proc_net, &ipv6_route_seq_ops, - sizeof(struct ipv6_route_iter)); - proc_create_net_single("rt6_stats", 0444, net->proc_net, - rt6_stats_seq_show, NULL); + if (!proc_create_net("ipv6_route", 0, net->proc_net, + &ipv6_route_seq_ops, + sizeof(struct ipv6_route_iter))) + return -ENOMEM; + + if (!proc_create_net_single("rt6_stats", 0444, net->proc_net, + rt6_stats_seq_show, NULL)) { + remove_proc_entry("ipv6_route", net->proc_net); + return -ENOMEM; + } #endif return 0; }
From: Kuniyuki Iwashima kuniyu@amazon.com
stable inclusion from stable-v4.19.265 commit 7162f05f1f0f2b9554ea207842a1389c067cc1db category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
commit 11052589cf5c0bab3b4884d423d5f60c38fcf25d upstream.
Commit e21145a9871a ("ipv4: namespacify ip_early_demux sysctl knob") made it possible to enable/disable early_demux on a per-netns basis. Then, we introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for TCP/UDP in commit dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp"). However, the .proc_handler() was wrong and actually disabled us from changing the behaviour in each netns.
We can execute early_demux if net.ipv4.ip_early_demux is on and each proto .early_demux() handler is not NULL. When we toggle (tcp|udp)_early_demux, the change itself is saved in each netns variable, but the .early_demux() handler is a global variable, so the handler is switched based on the init_net's sysctl variable. Thus, netns (tcp|udp)_early_demux knobs have nothing to do with the logic. Whether we CAN execute proto .early_demux() is always decided by init_net's sysctl knob, and whether we DO it or not is by each netns ip_early_demux knob.
This patch namespacifies (tcp|udp)_early_demux again. For now, the users of the .early_demux() handler are TCP and UDP only, and they are called directly to avoid retpoline. So, we can remove the .early_demux() handler from inet6?_protos and need not dereference them in ip6?_rcv_finish_core(). If another proto needs .early_demux(), we can restore it at that time.
Fixes: dddb64bcb346 ("net: Add sysctl to toggle early demux for tcp and udp") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/net/protocol.h | 4 --- include/net/tcp.h | 2 ++ include/net/udp.h | 1 + net/ipv4/af_inet.c | 14 ++------- net/ipv4/ip_input.c | 32 ++++++++++++++------- net/ipv4/sysctl_net_ipv4.c | 59 ++------------------------------------ net/ipv6/ip6_input.c | 23 +++++++++------ net/ipv6/tcp_ipv6.c | 9 ++---- net/ipv6/udp.c | 9 ++---- 9 files changed, 47 insertions(+), 106 deletions(-)
diff --git a/include/net/protocol.h b/include/net/protocol.h index 4fc75f7ae23b..312a27393e0b 100644 --- a/include/net/protocol.h +++ b/include/net/protocol.h @@ -39,8 +39,6 @@
/* This is used to register protocols. */ struct net_protocol { - int (*early_demux)(struct sk_buff *skb); - int (*early_demux_handler)(struct sk_buff *skb); int (*handler)(struct sk_buff *skb); void (*err_handler)(struct sk_buff *skb, u32 info); unsigned int no_policy:1, @@ -54,8 +52,6 @@ struct net_protocol {
#if IS_ENABLED(CONFIG_IPV6) struct inet6_protocol { - void (*early_demux)(struct sk_buff *skb); - void (*early_demux_handler)(struct sk_buff *skb); int (*handler)(struct sk_buff *skb);
void (*err_handler)(struct sk_buff *skb, diff --git a/include/net/tcp.h b/include/net/tcp.h index 91b69dc5bed7..12741aed95b5 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -903,6 +903,8 @@ static inline int tcp_v6_sdif(const struct sk_buff *skb) #endif return 0; } + +void tcp_v6_early_demux(struct sk_buff *skb); #endif
static inline bool inet_exact_dif_match(struct net *net, struct sk_buff *skb) diff --git a/include/net/udp.h b/include/net/udp.h index 4d82cabd733c..7f3067172e50 100644 --- a/include/net/udp.h +++ b/include/net/udp.h @@ -173,6 +173,7 @@ typedef struct sock *(*udp_lookup_t)(struct sk_buff *skb, __be16 sport, struct sk_buff *udp_gro_receive(struct list_head *head, struct sk_buff *skb, struct udphdr *uh, udp_lookup_t lookup); int udp_gro_complete(struct sk_buff *skb, int nhoff, udp_lookup_t lookup); +void udp_v6_early_demux(struct sk_buff *skb);
struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb, netdev_features_t features); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 72aee2b4a6ee..13e8635bcfa6 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1692,12 +1692,7 @@ static const struct net_protocol igmp_protocol = { }; #endif
-/* thinking of making this const? Don't. - * early_demux can change based on sysctl. - */ -static struct net_protocol tcp_protocol = { - .early_demux = tcp_v4_early_demux, - .early_demux_handler = tcp_v4_early_demux, +static const struct net_protocol tcp_protocol = { .handler = tcp_v4_rcv, .err_handler = tcp_v4_err, .no_policy = 1, @@ -1705,12 +1700,7 @@ static struct net_protocol tcp_protocol = { .icmp_strict_tag_validation = 1, };
-/* thinking of making this const? Don't. - * early_demux can change based on sysctl. - */ -static struct net_protocol udp_protocol = { - .early_demux = udp_v4_early_demux, - .early_demux_handler = udp_v4_early_demux, +static const struct net_protocol udp_protocol = { .handler = udp_rcv, .err_handler = udp_err, .no_policy = 1, diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c index d4ab0edb53b5..7c00d4f786fa 100644 --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -306,28 +306,38 @@ static inline bool ip_rcv_options(struct sk_buff *skb, struct net_device *dev) return true; }
+int udp_v4_early_demux(struct sk_buff *); +int tcp_v4_early_demux(struct sk_buff *); static int ip_rcv_finish_core(struct net *net, struct sock *sk, struct sk_buff *skb, struct net_device *dev) { const struct iphdr *iph = ip_hdr(skb); - int (*edemux)(struct sk_buff *skb); struct rtable *rt; int err;
- if (net->ipv4.sysctl_ip_early_demux && + if (READ_ONCE(net->ipv4.sysctl_ip_early_demux) && !skb_dst(skb) && !skb->sk && !ip_is_fragment(iph)) { - const struct net_protocol *ipprot; - int protocol = iph->protocol; + switch (iph->protocol) { + case IPPROTO_TCP: + if (READ_ONCE(net->ipv4.sysctl_tcp_early_demux)) { + tcp_v4_early_demux(skb);
- ipprot = rcu_dereference(inet_protos[protocol]); - if (ipprot && (edemux = READ_ONCE(ipprot->early_demux))) { - err = edemux(skb); - if (unlikely(err)) - goto drop_error; - /* must reload iph, skb->head might have changed */ - iph = ip_hdr(skb); + /* must reload iph, skb->head might have changed */ + iph = ip_hdr(skb); + } + break; + case IPPROTO_UDP: + if (READ_ONCE(net->ipv4.sysctl_udp_early_demux)) { + err = udp_v4_early_demux(skb); + if (unlikely(err)) + goto drop_error; + + /* must reload iph, skb->head might have changed */ + iph = ip_hdr(skb); + } + break; } }
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 3311e7229c05..5d18e956cbe1 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -332,61 +332,6 @@ static int proc_tcp_fastopen_key(struct ctl_table *table, int write, return ret; }
-static void proc_configure_early_demux(int enabled, int protocol) -{ - struct net_protocol *ipprot; -#if IS_ENABLED(CONFIG_IPV6) - struct inet6_protocol *ip6prot; -#endif - - rcu_read_lock(); - - ipprot = rcu_dereference(inet_protos[protocol]); - if (ipprot) - ipprot->early_demux = enabled ? ipprot->early_demux_handler : - NULL; - -#if IS_ENABLED(CONFIG_IPV6) - ip6prot = rcu_dereference(inet6_protos[protocol]); - if (ip6prot) - ip6prot->early_demux = enabled ? ip6prot->early_demux_handler : - NULL; -#endif - rcu_read_unlock(); -} - -static int proc_tcp_early_demux(struct ctl_table *table, int write, - void __user *buffer, size_t *lenp, loff_t *ppos) -{ - int ret = 0; - - ret = proc_dointvec(table, write, buffer, lenp, ppos); - - if (write && !ret) { - int enabled = init_net.ipv4.sysctl_tcp_early_demux; - - proc_configure_early_demux(enabled, IPPROTO_TCP); - } - - return ret; -} - -static int proc_udp_early_demux(struct ctl_table *table, int write, - void __user *buffer, size_t *lenp, loff_t *ppos) -{ - int ret = 0; - - ret = proc_dointvec(table, write, buffer, lenp, ppos); - - if (write && !ret) { - int enabled = init_net.ipv4.sysctl_udp_early_demux; - - proc_configure_early_demux(enabled, IPPROTO_UDP); - } - - return ret; -} - static int proc_tfo_blackhole_detect_timeout(struct ctl_table *table, int write, void __user *buffer, @@ -699,14 +644,14 @@ static struct ctl_table ipv4_net_table[] = { .data = &init_net.ipv4.sysctl_udp_early_demux, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_udp_early_demux + .proc_handler = proc_douintvec_minmax, }, { .procname = "tcp_early_demux", .data = &init_net.ipv4.sysctl_tcp_early_demux, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_tcp_early_demux + .proc_handler = proc_douintvec_minmax, }, { .procname = "ip_default_ttl", diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c index 0dbd9f99bb54..da994b2e9a25 100644 --- a/net/ipv6/ip6_input.c +++ b/net/ipv6/ip6_input.c @@ -47,18 +47,25 @@ #include <net/inet_ecn.h> #include <net/dst_metadata.h>
+void udp_v6_early_demux(struct sk_buff *); +void tcp_v6_early_demux(struct sk_buff *); static void ip6_rcv_finish_core(struct net *net, struct sock *sk, struct sk_buff *skb) { - void (*edemux)(struct sk_buff *skb); - - if (net->ipv4.sysctl_ip_early_demux && !skb_dst(skb) && skb->sk == NULL) { - const struct inet6_protocol *ipprot; - - ipprot = rcu_dereference(inet6_protos[ipv6_hdr(skb)->nexthdr]); - if (ipprot && (edemux = READ_ONCE(ipprot->early_demux))) - edemux(skb); + if (READ_ONCE(net->ipv4.sysctl_ip_early_demux) && + !skb_dst(skb) && !skb->sk) { + switch (ipv6_hdr(skb)->nexthdr) { + case IPPROTO_TCP: + if (READ_ONCE(net->ipv4.sysctl_tcp_early_demux)) + tcp_v6_early_demux(skb); + break; + case IPPROTO_UDP: + if (READ_ONCE(net->ipv4.sysctl_udp_early_demux)) + udp_v6_early_demux(skb); + break; + } } + if (!skb_valid_dst(skb)) ip6_route_input(skb); } diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 79171f2e3c83..79164360b635 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1664,7 +1664,7 @@ static int tcp_v6_rcv(struct sk_buff *skb) goto discard_it; }
-static void tcp_v6_early_demux(struct sk_buff *skb) +void tcp_v6_early_demux(struct sk_buff *skb) { const struct ipv6hdr *hdr; const struct tcphdr *th; @@ -2019,12 +2019,7 @@ struct proto tcpv6_prot = { .diag_destroy = tcp_abort, };
-/* thinking of making this const? Don't. - * early_demux can change based on sysctl. - */ -static struct inet6_protocol tcpv6_protocol = { - .early_demux = tcp_v6_early_demux, - .early_demux_handler = tcp_v6_early_demux, +static const struct inet6_protocol tcpv6_protocol = { .handler = tcp_v6_rcv, .err_handler = tcp_v6_err, .flags = INET6_PROTO_NOPOLICY|INET6_PROTO_FINAL, diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 3f66026098cc..88aa8bfb46af 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -910,7 +910,7 @@ static struct sock *__udp6_lib_demux_lookup(struct net *net, return NULL; }
-static void udp_v6_early_demux(struct sk_buff *skb) +void udp_v6_early_demux(struct sk_buff *skb) { struct net *net = dev_net(skb->dev); const struct udphdr *uh; @@ -1533,12 +1533,7 @@ int compat_udpv6_getsockopt(struct sock *sk, int level, int optname, } #endif
-/* thinking of making this const? Don't. - * early_demux can change based on sysctl. - */ -static struct inet6_protocol udpv6_protocol = { - .early_demux = udp_v6_early_demux, - .early_demux_handler = udp_v6_early_demux, +static const struct inet6_protocol udpv6_protocol = { .handler = udpv6_rcv, .err_handler = udpv6_err, .flags = INET6_PROTO_NOPOLICY|INET6_PROTO_FINAL,
From: Jiri Benc jbenc@redhat.com
stable inclusion from stable-v4.19.267 commit bd5362e58721e4d0d1a37796593bd6e51536ce7a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
[ Upstream commit 9e4b7a99a03aefd37ba7bb1f022c8efab5019165 ]
Since commit 3dcbdb134f32 ("net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list"), it is allowed to change gso_size of a GRO packet. However, that commit assumes that "checking the first list_skb member suffices; i.e if either of the list_skb members have non head_frag head, then the first one has too".
It turns out this assumption does not hold. We've seen BUG_ON being hit in skb_segment when skbs on the frag_list had differing head_frag with the vmxnet3 driver. This happens because __netdev_alloc_skb and __napi_alloc_skb can return a skb that is page backed or kmalloced depending on the requested size. As the result, the last small skb in the GRO packet can be kmalloced.
There are three different locations where this can be fixed:
(1) We could check head_frag in GRO and not allow GROing skbs with different head_frag. However, that would lead to performance regression on normal forward paths with unmodified gso_size, where !head_frag in the last packet is not a problem.
(2) Set a flag in bpf_skb_net_grow and bpf_skb_net_shrink indicating that NETIF_F_SG is undesirable. That would need to eat a bit in sk_buff. Furthermore, that flag can be unset when all skbs on the frag_list are page backed. To retain good performance, bpf_skb_net_grow/shrink would have to walk the frag_list.
(3) Walk the frag_list in skb_segment when determining whether NETIF_F_SG should be cleared. This of course slows things down.
This patch implements (3). To limit the performance impact in skb_segment, the list is walked only for skbs with SKB_GSO_DODGY set that have gso_size changed. Normal paths thus will not hit it.
We could check only the last skb but since we need to walk the whole list anyway, let's stay on the safe side.
Fixes: 3dcbdb134f32 ("net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list") Signed-off-by: Jiri Benc jbenc@redhat.com Reviewed-by: Willem de Bruijn willemb@google.com Link: https://lore.kernel.org/r/e04426a6a91baf4d1081e1b478c82b5de25fdf21.166740794... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/core/skbuff.c | 36 +++++++++++++++++++----------------- 1 file changed, 19 insertions(+), 17 deletions(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index e0be1f8651bb..4178fc28c277 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3560,23 +3560,25 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, int pos; int dummy;
- if (list_skb && !list_skb->head_frag && skb_headlen(list_skb) && - (skb_shinfo(head_skb)->gso_type & SKB_GSO_DODGY)) { - /* gso_size is untrusted, and we have a frag_list with a linear - * non head_frag head. - * - * (we assume checking the first list_skb member suffices; - * i.e if either of the list_skb members have non head_frag - * head, then the first one has too). - * - * If head_skb's headlen does not fit requested gso_size, it - * means that the frag_list members do NOT terminate on exact - * gso_size boundaries. Hence we cannot perform skb_frag_t page - * sharing. Therefore we must fallback to copying the frag_list - * skbs; we do so by disabling SG. - */ - if (mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) - features &= ~NETIF_F_SG; + if ((skb_shinfo(head_skb)->gso_type & SKB_GSO_DODGY) && + mss != GSO_BY_FRAGS && mss != skb_headlen(head_skb)) { + struct sk_buff *check_skb; + + for (check_skb = list_skb; check_skb; check_skb = check_skb->next) { + if (skb_headlen(check_skb) && !check_skb->head_frag) { + /* gso_size is untrusted, and we have a frag_list with + * a linear non head_frag item. + * + * If head_skb's headlen does not fit requested gso_size, + * it means that the frag_list members do NOT terminate + * on exact gso_size boundaries. Hence we cannot perform + * skb_frag_t page sharing. Therefore we must fallback to + * copying the frag_list skbs; we do so by disabling SG. + */ + features &= ~NETIF_F_SG; + break; + } + } }
__skb_push(head_skb, doffset);
From: Alexander Potapenko glider@google.com
stable inclusion from stable-v4.19.267 commit 6d26d0587abccb9835382a0b53faa7b9b1cd83e3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
[ Upstream commit c23fb2c82267638f9d206cb96bb93e1f93ad7828 ]
When copying a `struct ifaddrlblmsg` to the network, __ifal_reserved remained uninitialized, resulting in a 1-byte infoleak:
BUG: KMSAN: kernel-network-infoleak in __netdev_start_xmit ./include/linux/netdevice.h:4841 __netdev_start_xmit ./include/linux/netdevice.h:4841 netdev_start_xmit ./include/linux/netdevice.h:4857 xmit_one net/core/dev.c:3590 dev_hard_start_xmit+0x1dc/0x800 net/core/dev.c:3606 __dev_queue_xmit+0x17e8/0x4350 net/core/dev.c:4256 dev_queue_xmit ./include/linux/netdevice.h:3009 __netlink_deliver_tap_skb net/netlink/af_netlink.c:307 __netlink_deliver_tap+0x728/0xad0 net/netlink/af_netlink.c:325 netlink_deliver_tap net/netlink/af_netlink.c:338 __netlink_sendskb net/netlink/af_netlink.c:1263 netlink_sendskb+0x1d9/0x200 net/netlink/af_netlink.c:1272 netlink_unicast+0x56d/0xf50 net/netlink/af_netlink.c:1360 nlmsg_unicast ./include/net/netlink.h:1061 rtnl_unicast+0x5a/0x80 net/core/rtnetlink.c:758 ip6addrlbl_get+0xfad/0x10f0 net/ipv6/addrlabel.c:628 rtnetlink_rcv_msg+0xb33/0x1570 net/core/rtnetlink.c:6082 ... Uninit was created at: slab_post_alloc_hook+0x118/0xb00 mm/slab.h:742 slab_alloc_node mm/slub.c:3398 __kmem_cache_alloc_node+0x4f2/0x930 mm/slub.c:3437 __do_kmalloc_node mm/slab_common.c:954 __kmalloc_node_track_caller+0x117/0x3d0 mm/slab_common.c:975 kmalloc_reserve net/core/skbuff.c:437 __alloc_skb+0x27a/0xab0 net/core/skbuff.c:509 alloc_skb ./include/linux/skbuff.h:1267 nlmsg_new ./include/net/netlink.h:964 ip6addrlbl_get+0x490/0x10f0 net/ipv6/addrlabel.c:608 rtnetlink_rcv_msg+0xb33/0x1570 net/core/rtnetlink.c:6082 netlink_rcv_skb+0x299/0x550 net/netlink/af_netlink.c:2540 rtnetlink_rcv+0x26/0x30 net/core/rtnetlink.c:6109 netlink_unicast_kernel net/netlink/af_netlink.c:1319 netlink_unicast+0x9ab/0xf50 net/netlink/af_netlink.c:1345 netlink_sendmsg+0xebc/0x10f0 net/netlink/af_netlink.c:1921 ...
This patch ensures that the reserved field is always initialized.
Reported-by: syzbot+3553517af6020c4f2813f1003fe76ef3cbffe98d@syzkaller.appspotmail.com Fixes: 2a8cc6c89039 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.") Signed-off-by: Alexander Potapenko glider@google.com Reviewed-by: David Ahern dsahern@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/ipv6/addrlabel.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/ipv6/addrlabel.c b/net/ipv6/addrlabel.c index c7dc8b2de6c2..7fdd433b968e 100644 --- a/net/ipv6/addrlabel.c +++ b/net/ipv6/addrlabel.c @@ -437,6 +437,7 @@ static void ip6addrlbl_putmsg(struct nlmsghdr *nlh, { struct ifaddrlblmsg *ifal = nlmsg_data(nlh); ifal->ifal_family = AF_INET6; + ifal->__ifal_reserved = 0; ifal->ifal_prefixlen = prefixlen; ifal->ifal_flags = 0; ifal->ifal_index = ifindex;
From: Chuang Wang nashuiliang@gmail.com
stable inclusion from stable-v4.19.267 commit a81b44d1df1f07f00c0dcc0a0b3d2fa24a46289e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
[ Upstream commit 23569b5652ee8e8e55a12f7835f59af6f3cefc30 ]
kmemleak reports memory leaks in macvlan_common_newlink, as follows:
ip link add link eth0 name .. type macvlan mode source macaddr add <MAC-ADDR>
kmemleak reports:
unreferenced object 0xffff8880109bb140 (size 64): comm "ip", pid 284, jiffies 4294986150 (age 430.108s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 b8 aa 5a 12 80 88 ff ff ..........Z..... 80 1b fa 0d 80 88 ff ff 1e ff ac af c7 c1 6b 6b ..............kk backtrace: [<ffffffff813e06a7>] kmem_cache_alloc_trace+0x1c7/0x300 [<ffffffff81b66025>] macvlan_hash_add_source+0x45/0xc0 [<ffffffff81b66a67>] macvlan_changelink_sources+0xd7/0x170 [<ffffffff81b6775c>] macvlan_common_newlink+0x38c/0x5a0 [<ffffffff81b6797e>] macvlan_newlink+0xe/0x20 [<ffffffff81d97f8f>] __rtnl_newlink+0x7af/0xa50 [<ffffffff81d98278>] rtnl_newlink+0x48/0x70 ...
In the scenario where the macvlan mode is configured as 'source', macvlan_changelink_sources() will be execured to reconfigure list of remote source mac addresses, at the same time, if register_netdevice() return an error, the resource generated by macvlan_changelink_sources() is not cleaned up.
Using this patch, in the case of an error, it will execute macvlan_flush_sources() to ensure that the resource is cleaned up.
Fixes: aa5fd0fb7748 ("driver: macvlan: Destroy new macvlan port if macvlan_common_newlink failed.") Signed-off-by: Chuang Wang nashuiliang@gmail.com Link: https://lore.kernel.org/r/20221109090735.690500-1-nashuiliang@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/net/macvlan.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 44853cc2d9be..6d346347c9d3 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -1470,8 +1470,10 @@ int macvlan_common_newlink(struct net *src_net, struct net_device *dev, /* the macvlan port may be freed by macvlan_uninit when fail to register. * so we destroy the macvlan port only when it's valid. */ - if (create && macvlan_port_get_rtnl(lowerdev)) + if (create && macvlan_port_get_rtnl(lowerdev)) { + macvlan_flush_sources(port, vlan); macvlan_port_destroy(port->dev); + } return err; } EXPORT_SYMBOL_GPL(macvlan_common_newlink);
From: Eric Dumazet edumazet@google.com
stable inclusion from stable-v4.19.267 commit 650137a7c0b2892df2e5b0bc112d7b09a78c93c8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
commit b64085b00044bdf3cd1c9825e9ef5b2e0feae91a upstream.
macvlan should enforce a minimal mtu of 68, even at link creation.
This patch avoids the current behavior (which could lead to crashes in ipv6 stack if the link is brought up)
$ ip link add macvlan1 link eno1 mtu 8 type macvlan # This should fail ! $ ip link sh dev macvlan1 5: macvlan1@eno1: <BROADCAST,MULTICAST> mtu 8 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 02:47:6c:24:74:82 brd ff:ff:ff:ff:ff:ff $ ip link set macvlan1 mtu 67 Error: mtu less than device minimum. $ ip link set macvlan1 mtu 68 $ ip link set macvlan1 mtu 8 Error: mtu less than device minimum.
Fixes: 91572088e3fd ("net: use core MTU range checking in core net infra") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/net/macvlan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index 6d346347c9d3..956ebfb41921 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -1136,7 +1136,7 @@ void macvlan_common_setup(struct net_device *dev) { ether_setup(dev);
- dev->min_mtu = 0; + /* ether_setup() has set dev->min_mtu to ETH_MIN_MTU. */ dev->max_mtu = ETH_MAX_MTU; dev->priv_flags &= ~IFF_TX_SKB_SHARING; netif_keep_dst(dev);
From: Baisong Zhong zhongbaisong@huawei.com
stable inclusion from stable-v4.19.267 commit 730fb1ef974a13915bc7651364d8b3318891cd70 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA
--------------------------------
commit d3fd203f36d46aa29600a72d57a1b61af80e4a25 upstream.
We got a syzkaller problem because of aarch64 alignment fault if KFENCE enabled. When the size from user bpf program is an odd number, like 399, 407, etc, it will cause the struct skb_shared_info's unaligned access. As seen below:
BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
Use-after-free read at 0xffff6254fffac077 (in kfence-#213): __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline] arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline] arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline] atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline] __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032 skb_clone+0xf4/0x214 net/core/skbuff.c:1481 ____bpf_clone_redirect net/core/filter.c:2433 [inline] bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420 bpf_prog_d3839dd9068ceb51+0x80/0x330 bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline] bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53 bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594 bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, cache=kmalloc-512
allocated by task 15074 on cpu 0 at 1342.585390s: kmalloc include/linux/slab.h:568 [inline] kzalloc include/linux/slab.h:675 [inline] bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191 bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512 bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381
To fix the problem, we adjust @size so that (@size + @hearoom) is a multiple of SMP_CACHE_BYTES. So we make sure the struct skb_shared_info is aligned to a cache line.
Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command") Signed-off-by: Baisong Zhong zhongbaisong@huawei.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Cc: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/bpf/20221102081620.1465154-1-zhongbaisong@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/bpf/test_run.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index fba139c995aa..f6714cb3b0ba 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -87,6 +87,7 @@ static void *bpf_test_init(const union bpf_attr *kattr, u32 size, if (size < ETH_HLEN || size > PAGE_SIZE - headroom - tailroom) return ERR_PTR(-EINVAL);
+ size = SKB_DATA_ALIGN(size); data = kzalloc(size + headroom + tailroom, GFP_USER); if (!data) return ERR_PTR(-ENOMEM);