June 2024 - Kernel - mailweb.openeuler.org

[PATCH OLK-5.10] btrfs: fix deadlock with fiemap and extent locking
by Zizhi Wo 06 Jun '24

06 Jun '24

From: Josef Bacik <josef(a)toxicpanda.com> stable inclusion from stable-v6.6.24 commit ded566b4637f1b6b4c9ba74e7d0b8493e93f19cf category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9Q8ZZ CVE: CVE-2024-35784 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- commit b0ad381fa7690244802aed119b478b4bdafc31dd upstream. While working on the patchset to remove extent locking I got a lockdep splat with fiemap and pagefaulting with my new extent lock replacement lock. This deadlock exists with our normal code, we just don't have lockdep annotations with the extent locking so we've never noticed it. Since we're copying the fiemap extent to user space on every iteration we have the chance of pagefaulting. Because we hold the extent lock for the entire range we could mkwrite into a range in the file that we have mmap'ed. This would deadlock with the following stack trace [<0>] lock_extent+0x28d/0x2f0 [<0>] btrfs_page_mkwrite+0x273/0x8a0 [<0>] do_page_mkwrite+0x50/0xb0 [<0>] do_fault+0xc1/0x7b0 [<0>] __handle_mm_fault+0x2fa/0x460 [<0>] handle_mm_fault+0xa4/0x330 [<0>] do_user_addr_fault+0x1f4/0x800 [<0>] exc_page_fault+0x7c/0x1e0 [<0>] asm_exc_page_fault+0x26/0x30 [<0>] rep_movs_alternative+0x33/0x70 [<0>] _copy_to_user+0x49/0x70 [<0>] fiemap_fill_next_extent+0xc8/0x120 [<0>] emit_fiemap_extent+0x4d/0xa0 [<0>] extent_fiemap+0x7f8/0xad0 [<0>] btrfs_fiemap+0x49/0x80 [<0>] __x64_sys_ioctl+0x3e1/0xb50 [<0>] do_syscall_64+0x94/0x1a0 [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 I wrote an fstest to reproduce this deadlock without my replacement lock and verified that the deadlock exists with our existing locking. To fix this simply don't take the extent lock for the entire duration of the fiemap. This is safe in general because we keep track of where we are when we're searching the tree, so if an ordered extent updates in the middle of our fiemap call we'll still emit the correct extents because we know what offset we were on before. The only place we maintain the lock is searching delalloc. Since the delalloc stuff can change during writeback we want to lock the extent range so we have a consistent view of delalloc at the time we're checking to see if we need to set the delalloc flag. With this patch applied we no longer deadlock with my testcase. CC: stable(a)vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: Josef Bacik <josef(a)toxicpanda.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Conflicts: fs/btrfs/extent_io.c [The key to fixing the patch is the fiemap_process_hole function, which locks only before querying delalloc. Earlier versions do not have this function, and adaptation requires a lot of refactoring patches. So do something similar directly in the get_extent_skip_holes function, which contains the logic to query delalloc.] Signed-off-by: Zizhi Wo <wozizhi(a)huawei.com> --- fs/btrfs/extent_io.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 685a375bb6af..e8ae864a0337 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4532,11 +4532,30 @@ static struct extent_map *get_extent_skip_holes(struct btrfs_inode *inode, return NULL; while (1) { + struct extent_state *cached_state = NULL; + u64 lockstart; + u64 lockend; + len = last - offset; if (len == 0) break; len = ALIGN(len, sectorsize); + lockstart = round_down(offset, sectorsize); + lockend = round_up(offset + len, sectorsize) - 1; + + /* + * We are only locking for the delalloc range because that's the + * only thing that can change here. With fiemap we have a lock + * on the inode, so no buffered or direct writes can happen. + * + * However mmaps and normal page writeback will cause this to + * change arbitrarily. We have to lock the extent lock here to + * make sure that nobody messes with the tree while we're doing + * btrfs_find_delalloc_in_range. + */ + lock_extent_bits(&inode->io_tree, lockstart, lockend, &cached_state); em = btrfs_get_extent_fiemap(inode, offset, len); + unlock_extent_cached(&inode->io_tree, lockstart, lockend, &cached_state); if (IS_ERR_OR_NULL(em)) return em; @@ -4679,7 +4698,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, u64 isize = i_size_read(&inode->vfs_inode); struct btrfs_key found_key; struct extent_map *em = NULL; - struct extent_state *cached_state = NULL; struct btrfs_path *path; struct btrfs_root *root = inode->root; struct fiemap_cache cache = { 0 }; @@ -4758,9 +4776,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, last_for_get_extent = isize; } - lock_extent_bits(&inode->io_tree, start, start + len - 1, - &cached_state); - em = get_extent_skip_holes(inode, start, last_for_get_extent); if (!em) goto out; @@ -4871,9 +4886,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, ret = emit_last_fiemap_cache(fieinfo, &cache); free_extent_map(em); out: - unlock_extent_cached(&inode->io_tree, start, start + len - 1, - &cached_state); - out_free_ulist: btrfs_free_path(path); ulist_free(roots); -- 2.39.2

2 1

[PATCH openEuler-1.0-LTS] llc: verify mac len before reading mac header
by Dong Chenchen 06 Jun '24

06 Jun '24

From: Willem de Bruijn <willemb(a)google.com> mainline inclusion from mainline-v6.7-rc1 commit 7b3ba18703a63f6fd487183b9262b08e5632da1b category: bugfix bugzilla: 189991, https://gitee.com/src-openeuler/kernel/issues/I9RG0B CVE: CVE-2023-52843 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- LLC reads the mac header with eth_hdr without verifying that the skb has an Ethernet header. Syzbot was able to enter llc_rcv on a tun device. Tun can insert packets without mac len and with user configurable skb->protocol (passing a tun_pi header when not configuring IFF_NO_PI). BUG: KMSAN: uninit-value in llc_station_ac_send_test_r net/llc/llc_station.c:81 [inline] BUG: KMSAN: uninit-value in llc_station_rcv+0x6fb/0x1290 net/llc/llc_station.c:111 llc_station_ac_send_test_r net/llc/llc_station.c:81 [inline] llc_station_rcv+0x6fb/0x1290 net/llc/llc_station.c:111 llc_rcv+0xc5d/0x14a0 net/llc/llc_input.c:218 __netif_receive_skb_one_core net/core/dev.c:5523 [inline] __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5637 netif_receive_skb_internal net/core/dev.c:5723 [inline] netif_receive_skb+0x58/0x660 net/core/dev.c:5782 tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1555 tun_get_user+0x54c5/0x69c0 drivers/net/tun.c:2002 Add a mac_len test before all three eth_hdr(skb) calls under net/llc. There are further uses in include/net/llc_pdu.h. All these are protected by a test skb->protocol == ETH_P_802_2. Which does not protect against this tun scenario. But the mac_len test added in this patch in llc_fixup_skb will indirectly protect those too. That is called from llc_rcv before any other LLC code. It is tempting to just add a blanket mac_len check in llc_rcv, but not sure whether that could break valid LLC paths that do not assume an Ethernet header. 802.2 LLC may be used on top of non-802.3 protocols in principle. The below referenced commit shows that used to, on top of Token Ring. At least one of the three eth_hdr uses goes back to before the start of git history. But the one that syzbot exercises is introduced in this commit. That commit is old enough (2008), that effectively all stable kernels should receive this. Fixes: f83f1768f833 ("[LLC]: skb allocation size for responses") Reported-by: syzbot+a8c7be6dee0de1b669cc(a)syzkaller.appspotmail.com Signed-off-by: Willem de Bruijn <willemb(a)google.com> Link: https://lore.kernel.org/r/20231025234251.3796495-1-willemdebruijn.kernel@gm… Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Dong Chenchen <dongchenchen2(a)huawei.com> --- net/llc/llc_input.c | 10 ++++++++-- net/llc/llc_s_ac.c | 3 +++ net/llc/llc_station.c | 3 +++ 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/net/llc/llc_input.c b/net/llc/llc_input.c index 82cb93f66b9b..de5023b9607c 100644 --- a/net/llc/llc_input.c +++ b/net/llc/llc_input.c @@ -127,8 +127,14 @@ static inline int llc_fixup_skb(struct sk_buff *skb) skb->transport_header += llc_len; skb_pull(skb, llc_len); if (skb->protocol == htons(ETH_P_802_2)) { - __be16 pdulen = eth_hdr(skb)->h_proto; - s32 data_size = ntohs(pdulen) - llc_len; + __be16 pdulen; + s32 data_size; + + if (skb->mac_len < ETH_HLEN) + return 0; + + pdulen = eth_hdr(skb)->h_proto; + data_size = ntohs(pdulen) - llc_len; if (data_size < 0 || !pskb_may_pull(skb, data_size)) diff --git a/net/llc/llc_s_ac.c b/net/llc/llc_s_ac.c index 7ae4cc684d3a..4cf636bb7850 100644 --- a/net/llc/llc_s_ac.c +++ b/net/llc/llc_s_ac.c @@ -153,6 +153,9 @@ int llc_sap_action_send_test_r(struct llc_sap *sap, struct sk_buff *skb) int rc = 1; u32 data_size; + if (skb->mac_len < ETH_HLEN) + return 1; + llc_pdu_decode_sa(skb, mac_da); llc_pdu_decode_da(skb, mac_sa); llc_pdu_decode_ssap(skb, &dsap); diff --git a/net/llc/llc_station.c b/net/llc/llc_station.c index c29170e767a8..64e2c67e16ba 100644 --- a/net/llc/llc_station.c +++ b/net/llc/llc_station.c @@ -77,6 +77,9 @@ static int llc_station_ac_send_test_r(struct sk_buff *skb) u32 data_size; struct sk_buff *nskb; + if (skb->mac_len < ETH_HLEN) + goto out; + /* The test request command is type U (llc_len = 3) */ data_size = ntohs(eth_hdr(skb)->h_proto) - 3; nskb = llc_alloc_frame(NULL, skb->dev, LLC_PDU_TYPE_U, data_size); -- 2.25.1

2 1

[PATCH OLK-5.10 0/1] CVE-2024-36914
by Yongqiang Liu 06 Jun '24

06 Jun '24

Alex Hung (1): drm/amd/display: Skip on writeback when it's not applicable drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 6 ++++++ 1 file changed, 6 insertions(+) -- 2.34.1

2 2

[PATCH openEuler-1.0-LTS] sched: remove WARN_ON on checking rq->tmp_alone_branch
by Hui Tang 06 Jun '24

06 Jun '24

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VPMT --------------------------- Remove warn printk for checking 'rq->tmp_alone_branch != &rq->leaf_cfs_rq_list' to avoid rq deadlock. Deadlock analaze: cpu 0 distribute_cfs_runtime --- rq_lock_irqsave(rq, &rf); ->__warn_printk ->try_to_wake_up --- rq_lock(rq, &rf), deadlock Call Trace: queued_spin_lock_slowpath at ffff000080173358 try_to_wake_up at ffff000080141068 wake_up_process at ffff00008014113c insert_work at ffff000080123750 __queue_work at ffff0000801257ac queue_work_on at ffff000080125c54 drm_fb_helper_dirty at ffff0000806dcd44 drm_fb_helper_sys_imageblit at ffff0000806dcf04 virtio_gpu_3d_imageblit at ffff000000c915d0 [virtio_gpu] soft_cursor at ffff0000805e3e04 bit_cursor at ffff0000805e3654 fbcon_cursor at ffff0000805df404 hide_cursor at ffff000080677d68 vt_console_print at ffff0000806799dc console_unlock at ffff000080183d78 vprintk_emit at ffff000080185948 vprintk_default at ffff000080185b80 vprintk_func at ffff000080186c44 printk at ffff000080186394 __warn_printk at ffff000080102d60 unthrottle_cfs_rq at ffff000080155e50 distribute_cfs_runtime at ffff00008015617c sched_cfs_period_timer at ffff00008015654c __hrtimer_run_queues at ffff0000801b2c58 hrtimer_interrupt at ffff0000801b3c74 arch_timer_handler_virt at ffff00008089dc3c handle_percpu_devid_irq at ffff00008018fb3c generic_handle_irq at ffff000080187140 __handle_domain_irq at ffff000080187adc gic_handle_irq at ffff000080081814 Fixes: 6e9efc5d870d ("sched/fair: Add tmp_alone_branch assertion") Signed-off-by: Hui Tang <tanghui20(a)huawei.com> --- kernel/sched/fair.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3bd5aa6dedb3..aee13d30a7de 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -393,9 +393,12 @@ static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq) } } +/* + * There are possible rq deadlock when warn is triggered, + * since try_to_wake_up may called by __warn_printk. + */ static inline void assert_list_leaf_cfs_rq(struct rq *rq) { - SCHED_WARN_ON(rq->tmp_alone_branch != &rq->leaf_cfs_rq_list); } /* Iterate thr' all leaf cfs_rq's on a runqueue */ -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS] sched: remove WARN_ON on checking rq->tmp_alone_branch
by Hui Tang 06 Jun '24

06 Jun '24

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I9VPMT --------------------------- Remove warn printk for checking 'rq->tmp_alone_branch != &rq->leaf_cfs_rq_list' to avoid rq deadlock. Deadlock analaze: cpu 0 distribute_cfs_runtime --- rq_lock_irqsave(rq, &rf); ->__warn_printk ->try_to_wake_up --- rq_lock(rq, &rf), deadlock Call Trace: queued_spin_lock_slowpath at ffff000080173358 try_to_wake_up at ffff000080141068 wake_up_process at ffff00008014113c insert_work at ffff000080123750 __queue_work at ffff0000801257ac queue_work_on at ffff000080125c54 drm_fb_helper_dirty at ffff0000806dcd44 drm_fb_helper_sys_imageblit at ffff0000806dcf04 virtio_gpu_3d_imageblit at ffff000000c915d0 [virtio_gpu] soft_cursor at ffff0000805e3e04 bit_cursor at ffff0000805e3654 fbcon_cursor at ffff0000805df404 hide_cursor at ffff000080677d68 vt_console_print at ffff0000806799dc console_unlock at ffff000080183d78 vprintk_emit at ffff000080185948 vprintk_default at ffff000080185b80 vprintk_func at ffff000080186c44 printk at ffff000080186394 __warn_printk at ffff000080102d60 unthrottle_cfs_rq at ffff000080155e50 distribute_cfs_runtime at ffff00008015617c sched_cfs_period_timer at ffff00008015654c __hrtimer_run_queues at ffff0000801b2c58 hrtimer_interrupt at ffff0000801b3c74 arch_timer_handler_virt at ffff00008089dc3c handle_percpu_devid_irq at ffff00008018fb3c generic_handle_irq at ffff000080187140 __handle_domain_irq at ffff000080187adc gic_handle_irq at ffff000080081814 Signed-off-by: Hui Tang <tanghui20(a)huawei.com> --- kernel/sched/fair.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3bd5aa6dedb3..aee13d30a7de 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -393,9 +393,12 @@ static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq) } } +/* + * There are possible rq deadlock when warn is triggered, + * since try_to_wake_up may called by __warn_printk. + */ static inline void assert_list_leaf_cfs_rq(struct rq *rq) { - SCHED_WARN_ON(rq->tmp_alone_branch != &rq->leaf_cfs_rq_list); } /* Iterate thr' all leaf cfs_rq's on a runqueue */ -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS] btrfs: fix deadlock with fiemap and extent locking
by Zizhi Wo 06 Jun '24

06 Jun '24

From: Josef Bacik <josef(a)toxicpanda.com> stable inclusion from stable-v6.6.24 commit ded566b4637f1b6b4c9ba74e7d0b8493e93f19cf category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9Q8ZZ CVE: CVE-2024-35784 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- commit b0ad381fa7690244802aed119b478b4bdafc31dd upstream. While working on the patchset to remove extent locking I got a lockdep splat with fiemap and pagefaulting with my new extent lock replacement lock. This deadlock exists with our normal code, we just don't have lockdep annotations with the extent locking so we've never noticed it. Since we're copying the fiemap extent to user space on every iteration we have the chance of pagefaulting. Because we hold the extent lock for the entire range we could mkwrite into a range in the file that we have mmap'ed. This would deadlock with the following stack trace [<0>] lock_extent+0x28d/0x2f0 [<0>] btrfs_page_mkwrite+0x273/0x8a0 [<0>] do_page_mkwrite+0x50/0xb0 [<0>] do_fault+0xc1/0x7b0 [<0>] __handle_mm_fault+0x2fa/0x460 [<0>] handle_mm_fault+0xa4/0x330 [<0>] do_user_addr_fault+0x1f4/0x800 [<0>] exc_page_fault+0x7c/0x1e0 [<0>] asm_exc_page_fault+0x26/0x30 [<0>] rep_movs_alternative+0x33/0x70 [<0>] _copy_to_user+0x49/0x70 [<0>] fiemap_fill_next_extent+0xc8/0x120 [<0>] emit_fiemap_extent+0x4d/0xa0 [<0>] extent_fiemap+0x7f8/0xad0 [<0>] btrfs_fiemap+0x49/0x80 [<0>] __x64_sys_ioctl+0x3e1/0xb50 [<0>] do_syscall_64+0x94/0x1a0 [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 I wrote an fstest to reproduce this deadlock without my replacement lock and verified that the deadlock exists with our existing locking. To fix this simply don't take the extent lock for the entire duration of the fiemap. This is safe in general because we keep track of where we are when we're searching the tree, so if an ordered extent updates in the middle of our fiemap call we'll still emit the correct extents because we know what offset we were on before. The only place we maintain the lock is searching delalloc. Since the delalloc stuff can change during writeback we want to lock the extent range so we have a consistent view of delalloc at the time we're checking to see if we need to set the delalloc flag. With this patch applied we no longer deadlock with my testcase. CC: stable(a)vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: Josef Bacik <josef(a)toxicpanda.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Conflicts: fs/btrfs/extent_io.c [The key to fixing the patch is the fiemap_process_hole function, which locks only before querying delalloc. Earlier versions do not have this function, and adaptation requires a lot of refactoring patches. So do something similar directly in the get_extent_skip_holes function, which contains the logic to query delalloc.] Signed-off-by: Zizhi Wo <wozizhi(a)huawei.com> --- fs/btrfs/extent_io.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index b20021c501d7..8aa7738d9aad 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4332,12 +4332,31 @@ static struct extent_map *get_extent_skip_holes(struct inode *inode, return NULL; while (1) { + struct extent_state *cached_state = NULL; + u64 lockstart; + u64 lockend; + len = last - offset; if (len == 0) break; len = ALIGN(len, sectorsize); + lockstart = round_down(offset, sectorsize); + lockend = round_up(offset + len, sectorsize); + + /* + * We are only locking for the delalloc range because that's the + * only thing that can change here. With fiemap we have a lock + * on the inode, so no buffered or direct writes can happen. + * + * However mmaps and normal page writeback will cause this to + * change arbitrarily. We have to lock the extent lock here to + * make sure that nobody messes with the tree while we're doing + * btrfs_find_delalloc_in_range. + */ + lock_extent(&BTRFS_I(inode)->io_tree, lockstart, lockend, &cached_state); em = btrfs_get_extent_fiemap(BTRFS_I(inode), NULL, 0, offset, len, 0); + unlock_extent(&BTRFS_I(inode)->io_tree, lockstart, lockend, &cached_state); if (IS_ERR_OR_NULL(em)) return em; @@ -4481,7 +4500,6 @@ int extent_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, u64 isize = i_size_read(inode); struct btrfs_key found_key; struct extent_map *em = NULL; - struct extent_state *cached_state = NULL; struct btrfs_path *path; struct btrfs_root *root = BTRFS_I(inode)->root; struct fiemap_cache cache = { 0 }; @@ -4547,9 +4565,6 @@ int extent_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, last_for_get_extent = isize; } - lock_extent_bits(&BTRFS_I(inode)->io_tree, start, start + len - 1, - &cached_state); - em = get_extent_skip_holes(inode, start, last_for_get_extent); if (!em) goto out; @@ -4662,8 +4677,6 @@ int extent_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, free_extent_map(em); out: btrfs_free_path(path); - unlock_extent_cached(&BTRFS_I(inode)->io_tree, start, start + len - 1, - &cached_state); return ret; } -- 2.39.2

2 1

[PATCH openEuler-1.0-LTS] atl1c: Work around the DMA RX overflow issue
by Dong Chenchen 06 Jun '24

06 Jun '24

From: Sieng-Piaw Liew <liew.s.piaw(a)gmail.com> mainline inclusion from mainline-v6.7-rc1 commit 86565682e9053e5deb128193ea9e88531bbae9cf category: bugfix bugzilla: 190056, https://gitee.com/src-openeuler/kernel/issues/I9REDN CVE: CVE-2023-52834 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- This is based on alx driver commit 881d0327db37 ("net: alx: Work around the DMA RX overflow issue"). The alx and atl1c drivers had RX overflow error which was why a custom allocator was created to avoid certain addresses. The simpler workaround then created for alx driver, but not for atl1c due to lack of tester. Instead of using a custom allocator, check the allocated skb address and use skb_reserve() to move away from problematic 0x...fc0 address. Tested on AR8131 on Acer 4540. Signed-off-by: Sieng-Piaw Liew <liew.s.piaw(a)gmail.com> Link: https://lore.kernel.org/r/20230912010711.12036-1-liew.s.piaw@gmail.com Signed-off-by: Paolo Abeni <pabeni(a)redhat.com> Conflicts: drivers/net/ethernet/atheros/atl1c/atl1c.h drivers/net/ethernet/atheros/atl1c/atl1c_main.c [commit 8042824a3c0b and 057f4af2b171 support multiple rx queues for atk1c driver, which lead to context conflicts. commit a9d6df642dc8 use napi_alloc_skb() to optimize performance in NAPI context, which not merged] Signed-off-by: Dong Chenchen <dongchenchen2(a)huawei.com> --- drivers/net/ethernet/atheros/atl1c/atl1c.h | 1 - .../net/ethernet/atheros/atl1c/atl1c_main.c | 52 +++++-------------- 2 files changed, 12 insertions(+), 41 deletions(-) diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c.h b/drivers/net/ethernet/atheros/atl1c/atl1c.h index c46b489ce9b4..86660a3cae49 100644 --- a/drivers/net/ethernet/atheros/atl1c/atl1c.h +++ b/drivers/net/ethernet/atheros/atl1c/atl1c.h @@ -521,7 +521,6 @@ struct atl1c_adapter { struct napi_struct napi; struct page *rx_page; unsigned int rx_page_offset; - unsigned int rx_frag_size; struct atl1c_hw hw; struct atl1c_hw_stats hw_stats; struct mii_if_info mii; /* MII interface info */ diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c index 7087b88550db..d8eb7dcf278f 100644 --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c @@ -480,15 +480,10 @@ static int atl1c_set_mac_addr(struct net_device *netdev, void *p) static void atl1c_set_rxbufsize(struct atl1c_adapter *adapter, struct net_device *dev) { - unsigned int head_size; int mtu = dev->mtu; adapter->rx_buffer_len = mtu > AT_RX_BUF_SIZE ? roundup(mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN, 8) : AT_RX_BUF_SIZE; - - head_size = SKB_DATA_ALIGN(adapter->rx_buffer_len + NET_SKB_PAD) + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - adapter->rx_frag_size = roundup_pow_of_two(head_size); } static netdev_features_t atl1c_fix_features(struct net_device *netdev, @@ -960,10 +955,6 @@ static void atl1c_free_ring_resources(struct atl1c_adapter *adapter) kfree(adapter->tpd_ring[0].buffer_info); adapter->tpd_ring[0].buffer_info = NULL; } - if (adapter->rx_page) { - put_page(adapter->rx_page); - adapter->rx_page = NULL; - } } /** @@ -1666,36 +1657,6 @@ static inline void atl1c_rx_checksum(struct atl1c_adapter *adapter, skb_checksum_none_assert(skb); } -static struct sk_buff *atl1c_alloc_skb(struct atl1c_adapter *adapter) -{ - struct sk_buff *skb; - struct page *page; - - if (adapter->rx_frag_size > PAGE_SIZE) - return netdev_alloc_skb(adapter->netdev, - adapter->rx_buffer_len); - - page = adapter->rx_page; - if (!page) { - adapter->rx_page = page = alloc_page(GFP_ATOMIC); - if (unlikely(!page)) - return NULL; - adapter->rx_page_offset = 0; - } - - skb = build_skb(page_address(page) + adapter->rx_page_offset, - adapter->rx_frag_size); - if (likely(skb)) { - skb_reserve(skb, NET_SKB_PAD); - adapter->rx_page_offset += adapter->rx_frag_size; - if (adapter->rx_page_offset >= PAGE_SIZE) - adapter->rx_page = NULL; - else - get_page(page); - } - return skb; -} - static int atl1c_alloc_rx_buffer(struct atl1c_adapter *adapter) { struct atl1c_rfd_ring *rfd_ring = &adapter->rfd_ring; @@ -1717,13 +1678,24 @@ static int atl1c_alloc_rx_buffer(struct atl1c_adapter *adapter) while (next_info->flags & ATL1C_BUFFER_FREE) { rfd_desc = ATL1C_RFD_DESC(rfd_ring, rfd_next_to_use); - skb = atl1c_alloc_skb(adapter); + /* When DMA RX address is set to something like + * 0x....fc0, it will be very likely to cause DMA + * RFD overflow issue. + * + * To work around it, we apply rx skb with 64 bytes + * longer space, and offset the address whenever + * 0x....fc0 is detected. + */ + skb = netdev_alloc_skb(adapter->netdev, adapter->rx_buffer_len + 64); if (unlikely(!skb)) { if (netif_msg_rx_err(adapter)) dev_warn(&pdev->dev, "alloc rx buffer failed\n"); break; } + if (((unsigned long)skb->data & 0xfff) == 0xfc0) + skb_reserve(skb, 64); + /* * Make buffer alignment 2 beyond a 16 byte boundary * this will result in a 16 byte aligned IP header after -- 2.25.1

2 1

[PATCH OLK-5.10] btrfs: fix deadlock with fiemap and extent locking
by Zizhi Wo 06 Jun '24

06 Jun '24

From: Josef Bacik <josef(a)toxicpanda.com> stable inclusion from stable-v6.6.24 commit ded566b4637f1b6b4c9ba74e7d0b8493e93f19cf category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9Q8ZZ CVE: CVE-2024-35784 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- commit b0ad381fa7690244802aed119b478b4bdafc31dd upstream. While working on the patchset to remove extent locking I got a lockdep splat with fiemap and pagefaulting with my new extent lock replacement lock. This deadlock exists with our normal code, we just don't have lockdep annotations with the extent locking so we've never noticed it. Since we're copying the fiemap extent to user space on every iteration we have the chance of pagefaulting. Because we hold the extent lock for the entire range we could mkwrite into a range in the file that we have mmap'ed. This would deadlock with the following stack trace [<0>] lock_extent+0x28d/0x2f0 [<0>] btrfs_page_mkwrite+0x273/0x8a0 [<0>] do_page_mkwrite+0x50/0xb0 [<0>] do_fault+0xc1/0x7b0 [<0>] __handle_mm_fault+0x2fa/0x460 [<0>] handle_mm_fault+0xa4/0x330 [<0>] do_user_addr_fault+0x1f4/0x800 [<0>] exc_page_fault+0x7c/0x1e0 [<0>] asm_exc_page_fault+0x26/0x30 [<0>] rep_movs_alternative+0x33/0x70 [<0>] _copy_to_user+0x49/0x70 [<0>] fiemap_fill_next_extent+0xc8/0x120 [<0>] emit_fiemap_extent+0x4d/0xa0 [<0>] extent_fiemap+0x7f8/0xad0 [<0>] btrfs_fiemap+0x49/0x80 [<0>] __x64_sys_ioctl+0x3e1/0xb50 [<0>] do_syscall_64+0x94/0x1a0 [<0>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 I wrote an fstest to reproduce this deadlock without my replacement lock and verified that the deadlock exists with our existing locking. To fix this simply don't take the extent lock for the entire duration of the fiemap. This is safe in general because we keep track of where we are when we're searching the tree, so if an ordered extent updates in the middle of our fiemap call we'll still emit the correct extents because we know what offset we were on before. The only place we maintain the lock is searching delalloc. Since the delalloc stuff can change during writeback we want to lock the extent range so we have a consistent view of delalloc at the time we're checking to see if we need to set the delalloc flag. With this patch applied we no longer deadlock with my testcase. CC: stable(a)vger.kernel.org # 6.1+ Reviewed-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: Josef Bacik <josef(a)toxicpanda.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Conflicts: fs/btrfs/extent_io.c [The key to fixing the patch is the fiemap_process_hole function, which locks only before querying delalloc. Earlier versions do not have this function, and adaptation requires a lot of refactoring patches. So do something similar directly in the get_extent_skip_holes function, which contains the logic to query delalloc.] Signed-off-by: Zizhi Wo <wozizhi(a)huawei.com> --- fs/btrfs/extent_io.c | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 685a375bb6af..0d7843323930 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -4532,11 +4532,30 @@ static struct extent_map *get_extent_skip_holes(struct btrfs_inode *inode, return NULL; while (1) { + struct extent_state *cached_state = NULL; + u64 lockstart; + u64 lockend; + len = last - offset; if (len == 0) break; len = ALIGN(len, sectorsize); + lockstart = round_down(offset, sectorsize); + lockend = round_up(offset + len, sectorsize); + + /* + * We are only locking for the delalloc range because that's the + * only thing that can change here. With fiemap we have a lock + * on the inode, so no buffered or direct writes can happen. + * + * However mmaps and normal page writeback will cause this to + * change arbitrarily. We have to lock the extent lock here to + * make sure that nobody messes with the tree while we're doing + * btrfs_find_delalloc_in_range. + */ + lock_extent(&inode->io_tree, lockstart, lockend, &cached_state); em = btrfs_get_extent_fiemap(inode, offset, len); + unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); if (IS_ERR_OR_NULL(em)) return em; @@ -4679,7 +4698,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, u64 isize = i_size_read(&inode->vfs_inode); struct btrfs_key found_key; struct extent_map *em = NULL; - struct extent_state *cached_state = NULL; struct btrfs_path *path; struct btrfs_root *root = inode->root; struct fiemap_cache cache = { 0 }; @@ -4758,9 +4776,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, last_for_get_extent = isize; } - lock_extent_bits(&inode->io_tree, start, start + len - 1, - &cached_state); - em = get_extent_skip_holes(inode, start, last_for_get_extent); if (!em) goto out; @@ -4871,9 +4886,6 @@ int extent_fiemap(struct btrfs_inode *inode, struct fiemap_extent_info *fieinfo, ret = emit_last_fiemap_cache(fieinfo, &cache); free_extent_map(em); out: - unlock_extent_cached(&inode->io_tree, start, start + len - 1, - &cached_state); - out_free_ulist: btrfs_free_path(path); ulist_free(roots); -- 2.39.2

2 1

[PATCH openEuler-22.03-LTS-SP1] atl1c: Work around the DMA RX overflow issue
by Dong Chenchen 06 Jun '24

06 Jun '24

From: Sieng-Piaw Liew <liew.s.piaw(a)gmail.com> mainline inclusion from mainline-v6.7-rc1 commit 86565682e9053e5deb128193ea9e88531bbae9cf category: bugfix bugzilla: 190056, https://gitee.com/src-openeuler/kernel/issues/I9REDN CVE: CVE-2023-52834 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- This is based on alx driver commit 881d0327db37 ("net: alx: Work around the DMA RX overflow issue"). The alx and atl1c drivers had RX overflow error which was why a custom allocator was created to avoid certain addresses. The simpler workaround then created for alx driver, but not for atl1c due to lack of tester. Instead of using a custom allocator, check the allocated skb address and use skb_reserve() to move away from problematic 0x...fc0 address. Tested on AR8131 on Acer 4540. Signed-off-by: Sieng-Piaw Liew <liew.s.piaw(a)gmail.com> Link: https://lore.kernel.org/r/20230912010711.12036-1-liew.s.piaw@gmail.com Signed-off-by: Paolo Abeni <pabeni(a)redhat.com> Conflicts: drivers/net/ethernet/atheros/atl1c/atl1c.h drivers/net/ethernet/atheros/atl1c/atl1c_main.c [commit 8042824a3c0b and 057f4af2b171 support multiple rx queues for atk1c driver, which lead to context conflicts. commit a9d6df642dc8 use napi_alloc_skb() to optimize performance in NAPI context, which not merged] Signed-off-by: Dong Chenchen <dongchenchen2(a)huawei.com> --- drivers/net/ethernet/atheros/atl1c/atl1c.h | 1 - .../net/ethernet/atheros/atl1c/atl1c_main.c | 52 +++++-------------- 2 files changed, 12 insertions(+), 41 deletions(-) diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c.h b/drivers/net/ethernet/atheros/atl1c/atl1c.h index a0562a90fb6d..c1792071f41f 100644 --- a/drivers/net/ethernet/atheros/atl1c/atl1c.h +++ b/drivers/net/ethernet/atheros/atl1c/atl1c.h @@ -508,7 +508,6 @@ struct atl1c_adapter { struct napi_struct napi; struct page *rx_page; unsigned int rx_page_offset; - unsigned int rx_frag_size; struct atl1c_hw hw; struct atl1c_hw_stats hw_stats; struct mii_if_info mii; /* MII interface info */ diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c index 2c5af0d7666a..8859a26cf3e0 100644 --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c @@ -464,15 +464,10 @@ static int atl1c_set_mac_addr(struct net_device *netdev, void *p) static void atl1c_set_rxbufsize(struct atl1c_adapter *adapter, struct net_device *dev) { - unsigned int head_size; int mtu = dev->mtu; adapter->rx_buffer_len = mtu > AT_RX_BUF_SIZE ? roundup(mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN, 8) : AT_RX_BUF_SIZE; - - head_size = SKB_DATA_ALIGN(adapter->rx_buffer_len + NET_SKB_PAD) + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - adapter->rx_frag_size = roundup_pow_of_two(head_size); } static netdev_features_t atl1c_fix_features(struct net_device *netdev, @@ -944,10 +939,6 @@ static void atl1c_free_ring_resources(struct atl1c_adapter *adapter) kfree(adapter->tpd_ring[0].buffer_info); adapter->tpd_ring[0].buffer_info = NULL; } - if (adapter->rx_page) { - put_page(adapter->rx_page); - adapter->rx_page = NULL; - } } /** @@ -1650,36 +1641,6 @@ static inline void atl1c_rx_checksum(struct atl1c_adapter *adapter, skb_checksum_none_assert(skb); } -static struct sk_buff *atl1c_alloc_skb(struct atl1c_adapter *adapter) -{ - struct sk_buff *skb; - struct page *page; - - if (adapter->rx_frag_size > PAGE_SIZE) - return netdev_alloc_skb(adapter->netdev, - adapter->rx_buffer_len); - - page = adapter->rx_page; - if (!page) { - adapter->rx_page = page = alloc_page(GFP_ATOMIC); - if (unlikely(!page)) - return NULL; - adapter->rx_page_offset = 0; - } - - skb = build_skb(page_address(page) + adapter->rx_page_offset, - adapter->rx_frag_size); - if (likely(skb)) { - skb_reserve(skb, NET_SKB_PAD); - adapter->rx_page_offset += adapter->rx_frag_size; - if (adapter->rx_page_offset >= PAGE_SIZE) - adapter->rx_page = NULL; - else - get_page(page); - } - return skb; -} - static int atl1c_alloc_rx_buffer(struct atl1c_adapter *adapter) { struct atl1c_rfd_ring *rfd_ring = &adapter->rfd_ring; @@ -1701,13 +1662,24 @@ static int atl1c_alloc_rx_buffer(struct atl1c_adapter *adapter) while (next_info->flags & ATL1C_BUFFER_FREE) { rfd_desc = ATL1C_RFD_DESC(rfd_ring, rfd_next_to_use); - skb = atl1c_alloc_skb(adapter); + /* When DMA RX address is set to something like + * 0x....fc0, it will be very likely to cause DMA + * RFD overflow issue. + * + * To work around it, we apply rx skb with 64 bytes + * longer space, and offset the address whenever + * 0x....fc0 is detected. + */ + skb = netdev_alloc_skb(adapter->netdev, adapter->rx_buffer_len + 64); if (unlikely(!skb)) { if (netif_msg_rx_err(adapter)) dev_warn(&pdev->dev, "alloc rx buffer failed\n"); break; } + if (((unsigned long)skb->data & 0xfff) == 0xfc0) + skb_reserve(skb, 64); + /* * Make buffer alignment 2 beyond a 16 byte boundary * this will result in a 16 byte aligned IP header after -- 2.25.1

2 1

[PATCH OLK-5.10] atl1c: Work around the DMA RX overflow issue
by Dong Chenchen 06 Jun '24

06 Jun '24

From: Sieng-Piaw Liew <liew.s.piaw(a)gmail.com> mainline inclusion from mainline-v6.7-rc1 commit 86565682e9053e5deb128193ea9e88531bbae9cf category: bugfix bugzilla: 190056, https://gitee.com/src-openeuler/kernel/issues/I9REDN CVE: CVE-2023-52834 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- This is based on alx driver commit 881d0327db37 ("net: alx: Work around the DMA RX overflow issue"). The alx and atl1c drivers had RX overflow error which was why a custom allocator was created to avoid certain addresses. The simpler workaround then created for alx driver, but not for atl1c due to lack of tester. Instead of using a custom allocator, check the allocated skb address and use skb_reserve() to move away from problematic 0x...fc0 address. Tested on AR8131 on Acer 4540. Signed-off-by: Sieng-Piaw Liew <liew.s.piaw(a)gmail.com> Link: https://lore.kernel.org/r/20230912010711.12036-1-liew.s.piaw@gmail.com Signed-off-by: Paolo Abeni <pabeni(a)redhat.com> Conflicts: drivers/net/ethernet/atheros/atl1c/atl1c.h drivers/net/ethernet/atheros/atl1c/atl1c_main.c [commit 8042824a3c0b and 057f4af2b171 support multiple rx queues for atk1c driver, which lead to context conflicts. commit a9d6df642dc8 use napi_alloc_skb() to optimize performance in NAPI context, which not merged] Signed-off-by: Dong Chenchen <dongchenchen2(a)huawei.com> --- drivers/net/ethernet/atheros/atl1c/atl1c.h | 1 - .../net/ethernet/atheros/atl1c/atl1c_main.c | 52 +++++-------------- 2 files changed, 12 insertions(+), 41 deletions(-) diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c.h b/drivers/net/ethernet/atheros/atl1c/atl1c.h index a0562a90fb6d..c1792071f41f 100644 --- a/drivers/net/ethernet/atheros/atl1c/atl1c.h +++ b/drivers/net/ethernet/atheros/atl1c/atl1c.h @@ -508,7 +508,6 @@ struct atl1c_adapter { struct napi_struct napi; struct page *rx_page; unsigned int rx_page_offset; - unsigned int rx_frag_size; struct atl1c_hw hw; struct atl1c_hw_stats hw_stats; struct mii_if_info mii; /* MII interface info */ diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c index 2c5af0d7666a..8859a26cf3e0 100644 --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c @@ -464,15 +464,10 @@ static int atl1c_set_mac_addr(struct net_device *netdev, void *p) static void atl1c_set_rxbufsize(struct atl1c_adapter *adapter, struct net_device *dev) { - unsigned int head_size; int mtu = dev->mtu; adapter->rx_buffer_len = mtu > AT_RX_BUF_SIZE ? roundup(mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN, 8) : AT_RX_BUF_SIZE; - - head_size = SKB_DATA_ALIGN(adapter->rx_buffer_len + NET_SKB_PAD) + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); - adapter->rx_frag_size = roundup_pow_of_two(head_size); } static netdev_features_t atl1c_fix_features(struct net_device *netdev, @@ -944,10 +939,6 @@ static void atl1c_free_ring_resources(struct atl1c_adapter *adapter) kfree(adapter->tpd_ring[0].buffer_info); adapter->tpd_ring[0].buffer_info = NULL; } - if (adapter->rx_page) { - put_page(adapter->rx_page); - adapter->rx_page = NULL; - } } /** @@ -1650,36 +1641,6 @@ static inline void atl1c_rx_checksum(struct atl1c_adapter *adapter, skb_checksum_none_assert(skb); } -static struct sk_buff *atl1c_alloc_skb(struct atl1c_adapter *adapter) -{ - struct sk_buff *skb; - struct page *page; - - if (adapter->rx_frag_size > PAGE_SIZE) - return netdev_alloc_skb(adapter->netdev, - adapter->rx_buffer_len); - - page = adapter->rx_page; - if (!page) { - adapter->rx_page = page = alloc_page(GFP_ATOMIC); - if (unlikely(!page)) - return NULL; - adapter->rx_page_offset = 0; - } - - skb = build_skb(page_address(page) + adapter->rx_page_offset, - adapter->rx_frag_size); - if (likely(skb)) { - skb_reserve(skb, NET_SKB_PAD); - adapter->rx_page_offset += adapter->rx_frag_size; - if (adapter->rx_page_offset >= PAGE_SIZE) - adapter->rx_page = NULL; - else - get_page(page); - } - return skb; -} - static int atl1c_alloc_rx_buffer(struct atl1c_adapter *adapter) { struct atl1c_rfd_ring *rfd_ring = &adapter->rfd_ring; @@ -1701,13 +1662,24 @@ static int atl1c_alloc_rx_buffer(struct atl1c_adapter *adapter) while (next_info->flags & ATL1C_BUFFER_FREE) { rfd_desc = ATL1C_RFD_DESC(rfd_ring, rfd_next_to_use); - skb = atl1c_alloc_skb(adapter); + /* When DMA RX address is set to something like + * 0x....fc0, it will be very likely to cause DMA + * RFD overflow issue. + * + * To work around it, we apply rx skb with 64 bytes + * longer space, and offset the address whenever + * 0x....fc0 is detected. + */ + skb = netdev_alloc_skb(adapter->netdev, adapter->rx_buffer_len + 64); if (unlikely(!skb)) { if (netif_msg_rx_err(adapter)) dev_warn(&pdev->dev, "alloc rx buffer failed\n"); break; } + if (((unsigned long)skb->data & 0xfff) == 0xfc0) + skb_reserve(skb, 64); + /* * Make buffer alignment 2 beyond a 16 byte boundary * this will result in a 16 byte aligned IP header after -- 2.25.1

2 1