June 2024 - Kernel - mailweb.openeuler.org

[PATCH openEuler-1.0-LTS] staging: rtl8712: fix use-after-free in rtl8712_dl_fw
by Jialin Zhang 08 Jun '24

08 Jun '24

From: Pavel Skripkin <paskripkin(a)gmail.com> mainline inclusion from mainline-v5.16-rc1 commit c052cc1a069c3e575619cf64ec427eb41176ca70 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9RCV5 CVE: CVE-2021-47479 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Syzbot reported use-after-free in rtl8712_dl_fw(). The problem was in race condition between r871xu_dev_remove() ->ndo_open() callback. It's easy to see from crash log, that driver accesses released firmware in ->ndo_open() callback. It may happen, since driver was releasing firmware _before_ unregistering netdev. Fix it by moving unregister_netdev() before cleaning up resources. Call Trace: ... rtl871x_open_fw drivers/staging/rtl8712/hal_init.c:83 [inline] rtl8712_dl_fw+0xd95/0xe10 drivers/staging/rtl8712/hal_init.c:170 rtl8712_hal_init drivers/staging/rtl8712/hal_init.c:330 [inline] rtl871x_hal_init+0xae/0x180 drivers/staging/rtl8712/hal_init.c:394 netdev_open+0xe6/0x6c0 drivers/staging/rtl8712/os_intfs.c:380 __dev_open+0x2bc/0x4d0 net/core/dev.c:1484 Freed by task 1306: ... release_firmware+0x1b/0x30 drivers/base/firmware_loader/main.c:1053 r871xu_dev_remove+0xcc/0x2c0 drivers/staging/rtl8712/usb_intf.c:599 usb_unbind_interface+0x1d8/0x8d0 drivers/usb/core/driver.c:458 Fixes: 8c213fa59199 ("staging: r8712u: Use asynchronous firmware loading") Cc: stable <stable(a)vger.kernel.org> Reported-and-tested-by: syzbot+c55162be492189fb4f51(a)syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin(a)gmail.com> Link: https://lore.kernel.org/r/20211019211718.26354-1-paskripkin@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Conflicts: drivers/staging/rtl8712/usb_intf.c [The conflict occurs because commit b4383c971bc5("staging: rtl8712: handle firmware load failure") is not merged] Signed-off-by: Jialin Zhang <zhangjialin11(a)huawei.com> --- drivers/staging/rtl8712/usb_intf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/rtl8712/usb_intf.c b/drivers/staging/rtl8712/usb_intf.c index 5e2cdc25401b..709d687180bf 100644 --- a/drivers/staging/rtl8712/usb_intf.c +++ b/drivers/staging/rtl8712/usb_intf.c @@ -623,13 +623,13 @@ static void r871xu_dev_remove(struct usb_interface *pusb_intf) if (pnetdev) { struct _adapter *padapter = netdev_priv(pnetdev); + unregister_netdev(pnetdev); /* will call netdev_close() */ usb_set_intfdata(pusb_intf, NULL); release_firmware(padapter->fw); /* never exit with a firmware callback pending */ wait_for_completion(&padapter->rtl8712_fw_ready); if (drvpriv.drv_registered) padapter->bSurpriseRemoved = true; - unregister_netdev(pnetdev); /* will call netdev_close() */ flush_scheduled_work(); udelay(1); /* Stop driver mlme relation timer */ -- 2.25.1

2 1

[PATCH OLK-6.6] tcp: Use refcount_inc_not_zero() in tcp_twsk_unique().
by Zhengchao Shao 08 Jun '24

08 Jun '24

From: Kuniyuki Iwashima <kuniyu(a)amazon.com> stable inclusion from stable-v6.6.31 commit 6e48faad92be13166184d21506e4e54c79c13adc category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9U4LA CVE: CVE-2024-36904 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit f2db7230f73a80dbb179deab78f88a7947f0ab7e ] Anderson Nascimento reported a use-after-free splat in tcp_twsk_unique() with nice analysis. Since commit ec94c2696f0b ("tcp/dccp: avoid one atomic operation for timewait hashdance"), inet_twsk_hashdance() sets TIME-WAIT socket's sk_refcnt after putting it into ehash and releasing the bucket lock. Thus, there is a small race window where other threads could try to reuse the port during connect() and call sock_hold() in tcp_twsk_unique() for the TIME-WAIT socket with zero refcnt. If that happens, the refcnt taken by tcp_twsk_unique() is overwritten and sock_put() will cause underflow, triggering a real use-after-free somewhere else. To avoid the use-after-free, we need to use refcount_inc_not_zero() in tcp_twsk_unique() and give up on reusing the port if it returns false. [0]: refcount_t: addition on 0; use-after-free. WARNING: CPU: 0 PID: 1039313 at lib/refcount.c:25 refcount_warn_saturate+0xe5/0x110 CPU: 0 PID: 1039313 Comm: trigger Not tainted 6.8.6-200.fc39.x86_64 #1 Hardware name: VMware, Inc. VMware20,1/440BX Desktop Reference Platform, BIOS VMW201.00V.21805430.B64.2305221830 05/22/2023 RIP: 0010:refcount_warn_saturate+0xe5/0x110 Code: 42 8e ff 0f 0b c3 cc cc cc cc 80 3d aa 13 ea 01 00 0f 85 5e ff ff ff 48 c7 c7 f8 8e b7 82 c6 05 96 13 ea 01 01 e8 7b 42 8e ff <0f> 0b c3 cc cc cc cc 48 c7 c7 50 8f b7 82 c6 05 7a 13 ea 01 01 e8 RSP: 0018:ffffc90006b43b60 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff888009bb3ef0 RCX: 0000000000000027 RDX: ffff88807be218c8 RSI: 0000000000000001 RDI: ffff88807be218c0 RBP: 0000000000069d70 R08: 0000000000000000 R09: ffffc90006b439f0 R10: ffffc90006b439e8 R11: 0000000000000003 R12: ffff8880029ede84 R13: 0000000000004e20 R14: ffffffff84356dc0 R15: ffff888009bb3ef0 FS: 00007f62c10926c0(0000) GS:ffff88807be00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020ccb000 CR3: 000000004628c005 CR4: 0000000000f70ef0 PKRU: 55555554 Call Trace: <TASK> ? refcount_warn_saturate+0xe5/0x110 ? __warn+0x81/0x130 ? refcount_warn_saturate+0xe5/0x110 ? report_bug+0x171/0x1a0 ? refcount_warn_saturate+0xe5/0x110 ? handle_bug+0x3c/0x80 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? refcount_warn_saturate+0xe5/0x110 tcp_twsk_unique+0x186/0x190 __inet_check_established+0x176/0x2d0 __inet_hash_connect+0x74/0x7d0 ? __pfx___inet_check_established+0x10/0x10 tcp_v4_connect+0x278/0x530 __inet_stream_connect+0x10f/0x3d0 inet_stream_connect+0x3a/0x60 __sys_connect+0xa8/0xd0 __x64_sys_connect+0x18/0x20 do_syscall_64+0x83/0x170 entry_SYSCALL_64_after_hwframe+0x78/0x80 RIP: 0033:0x7f62c11a885d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d a3 45 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007f62c1091e58 EFLAGS: 00000296 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 0000000020ccb004 RCX: 00007f62c11a885d RDX: 0000000000000010 RSI: 0000000020ccb000 RDI: 0000000000000003 RBP: 00007f62c1091e90 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000296 R12: 00007f62c10926c0 R13: ffffffffffffff88 R14: 0000000000000000 R15: 00007ffe237885b0 </TASK> Fixes: ec94c2696f0b ("tcp/dccp: avoid one atomic operation for timewait hashdance") Reported-by: Anderson Nascimento <anderson(a)allelesecurity.com> Closes: https://lore.kernel.org/netdev/37a477a6-d39e-486b-9577-3463f655a6b7@alleles… Suggested-by: Eric Dumazet <edumazet(a)google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu(a)amazon.com> Reviewed-by: Eric Dumazet <edumazet(a)google.com> Link: https://lore.kernel.org/r/20240501213145.62261-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- net/ipv4/tcp_ipv4.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 39f324bdd576..fce757159ad5 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -154,6 +154,12 @@ int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp) if (tcptw->tw_ts_recent_stamp && (!twp || (reuse && time_after32(ktime_get_seconds(), tcptw->tw_ts_recent_stamp)))) { + /* inet_twsk_hashdance() sets sk_refcnt after putting twsk + * and releasing the bucket lock. + */ + if (unlikely(!refcount_inc_not_zero(&sktw->sk_refcnt))) + return 0; + /* In case of repair and re-using TIME-WAIT sockets we still * want to be sure that it is safe as above but honor the * sequence numbers and time stamps set as part of the repair @@ -174,7 +180,7 @@ int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp) tp->rx_opt.ts_recent = tcptw->tw_ts_recent; tp->rx_opt.ts_recent_stamp = tcptw->tw_ts_recent_stamp; } - sock_hold(sktw); + return 1; } -- 2.34.1

2 1

[PATCH OLK-6.6 v2] blk-iocost: do not WARN if iocg was already offlined
by Li Nan 08 Jun '24

08 Jun '24

mainline inclusion from mainline-v6.9-rc5 commit 01bc4fda9ea0a6b52f12326486f07a4910666cf6 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9UABH CVE: CVE-2024-36908 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- In iocg_pay_debt(), warn is triggered if 'active_list' is empty, which is intended to confirm iocg is active when it has debt. However, warn can be triggered during a blkcg or disk removal, if iocg_waitq_timer_fn() is run at that time: WARNING: CPU: 0 PID: 2344971 at block/blk-iocost.c:1402 iocg_pay_debt+0x14c/0x190 Call trace: iocg_pay_debt+0x14c/0x190 iocg_kick_waitq+0x438/0x4c0 iocg_waitq_timer_fn+0xd8/0x130 __run_hrtimer+0x144/0x45c __hrtimer_run_queues+0x16c/0x244 hrtimer_interrupt+0x2cc/0x7b0 The warn in this situation is meaningless. Since this iocg is being removed, the state of the 'active_list' is irrelevant, and 'waitq_timer' is canceled after removing 'active_list' in ioc_pd_free(), which ensures iocg is freed after iocg_waitq_timer_fn() returns. Therefore, add the check if iocg was already offlined to avoid warn when removing a blkcg or disk. Signed-off-by: Li Nan <linan122(a)huawei.com> Reviewed-by: Yu Kuai <yukuai3(a)huawei.com> Acked-by: Tejun Heo <tj(a)kernel.org> Link: https://lore.kernel.org/r/20240419093257.3004211-1-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Conflict: block/blk-iocost.c Mainline checks 'pd.online', but there is no member online of pd in 5.10. Check 'iocg->online' instead. Signed-off-by: Li Nan <linan122(a)huawei.com> --- v2: fix bugzilla block/blk-iocost.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 6a3e8a21c0fc..3b9710ce77e5 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -1439,8 +1439,11 @@ static void iocg_pay_debt(struct ioc_gq *iocg, u64 abs_vpay, lockdep_assert_held(&iocg->ioc->lock); lockdep_assert_held(&iocg->waitq.lock); - /* make sure that nobody messed with @iocg */ - WARN_ON_ONCE(list_empty(&iocg->active_list)); + /* + * make sure that nobody messed with @iocg. Check iocg->online + * to avoid warn when removing blkcg or disk. + */ + WARN_ON_ONCE(list_empty(&iocg->active_list) && iocg->online); WARN_ON_ONCE(iocg->inuse > 1); iocg->abs_vdebt -= min(abs_vpay, iocg->abs_vdebt); -- 2.39.2

2 1

[PATCH OLK-6.6] blk-iocost: do not WARN if iocg was already offlined
by Li Nan 08 Jun '24

08 Jun '24

mainline inclusion from mainline-v6.9-rc5 commit 01bc4fda9ea0a6b52f12326486f07a4910666cf6 category: bugfix bugzilla: 189809 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- In iocg_pay_debt(), warn is triggered if 'active_list' is empty, which is intended to confirm iocg is active when it has debt. However, warn can be triggered during a blkcg or disk removal, if iocg_waitq_timer_fn() is run at that time: WARNING: CPU: 0 PID: 2344971 at block/blk-iocost.c:1402 iocg_pay_debt+0x14c/0x190 Call trace: iocg_pay_debt+0x14c/0x190 iocg_kick_waitq+0x438/0x4c0 iocg_waitq_timer_fn+0xd8/0x130 __run_hrtimer+0x144/0x45c __hrtimer_run_queues+0x16c/0x244 hrtimer_interrupt+0x2cc/0x7b0 The warn in this situation is meaningless. Since this iocg is being removed, the state of the 'active_list' is irrelevant, and 'waitq_timer' is canceled after removing 'active_list' in ioc_pd_free(), which ensures iocg is freed after iocg_waitq_timer_fn() returns. Therefore, add the check if iocg was already offlined to avoid warn when removing a blkcg or disk. Signed-off-by: Li Nan <linan122(a)huawei.com> Reviewed-by: Yu Kuai <yukuai3(a)huawei.com> Acked-by: Tejun Heo <tj(a)kernel.org> Link: https://lore.kernel.org/r/20240419093257.3004211-1-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Conflict: block/blk-iocost.c Mainline checks 'pd.online', but there is no member online of pd in 5.10. Check 'iocg->online' instead. Signed-off-by: Li Nan <linan122(a)huawei.com> --- block/blk-iocost.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index 6a3e8a21c0fc..3b9710ce77e5 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -1439,8 +1439,11 @@ static void iocg_pay_debt(struct ioc_gq *iocg, u64 abs_vpay, lockdep_assert_held(&iocg->ioc->lock); lockdep_assert_held(&iocg->waitq.lock); - /* make sure that nobody messed with @iocg */ - WARN_ON_ONCE(list_empty(&iocg->active_list)); + /* + * make sure that nobody messed with @iocg. Check iocg->online + * to avoid warn when removing blkcg or disk. + */ + WARN_ON_ONCE(list_empty(&iocg->active_list) && iocg->online); WARN_ON_ONCE(iocg->inuse > 1); iocg->abs_vdebt -= min(abs_vpay, iocg->abs_vdebt); -- 2.39.2

2 1

[PATCH openEuler-22.03-LTS-SP1] blk-iocost: do not WARN if iocg was already offlined
by Li Nan 08 Jun '24

08 Jun '24

mainline inclusion from mainline-v6.9-rc5 commit 01bc4fda9ea0a6b52f12326486f07a4910666cf6 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9UABH CVE: CVE-2024-36908 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- In iocg_pay_debt(), warn is triggered if 'active_list' is empty, which is intended to confirm iocg is active when it has debt. However, warn can be triggered during a blkcg or disk removal, if iocg_waitq_timer_fn() is run at that time: WARNING: CPU: 0 PID: 2344971 at block/blk-iocost.c:1402 iocg_pay_debt+0x14c/0x190 Call trace: iocg_pay_debt+0x14c/0x190 iocg_kick_waitq+0x438/0x4c0 iocg_waitq_timer_fn+0xd8/0x130 __run_hrtimer+0x144/0x45c __hrtimer_run_queues+0x16c/0x244 hrtimer_interrupt+0x2cc/0x7b0 The warn in this situation is meaningless. Since this iocg is being removed, the state of the 'active_list' is irrelevant, and 'waitq_timer' is canceled after removing 'active_list' in ioc_pd_free(), which ensures iocg is freed after iocg_waitq_timer_fn() returns. Therefore, add the check if iocg was already offlined to avoid warn when removing a blkcg or disk. Signed-off-by: Li Nan <linan122(a)huawei.com> Reviewed-by: Yu Kuai <yukuai3(a)huawei.com> Acked-by: Tejun Heo <tj(a)kernel.org> Link: https://lore.kernel.org/r/20240419093257.3004211-1-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Conflict: block/blk-iocost.c Mainline checks 'pd.online', but there is no member online of pd in 5.10. Check 'iocg->online' instead. Signed-off-by: Li Nan <linan122(a)huawei.com> --- block/blk-iocost.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index edab843d547a..2b3bbae302e2 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -1401,8 +1401,11 @@ static void iocg_pay_debt(struct ioc_gq *iocg, u64 abs_vpay, lockdep_assert_held(&iocg->ioc->lock); lockdep_assert_held(&iocg->waitq.lock); - /* make sure that nobody messed with @iocg */ - WARN_ON_ONCE(list_empty(&iocg->active_list)); + /* + * make sure that nobody messed with @iocg. Check iocg->online + * to avoid warn when removing blkcg or disk. + */ + WARN_ON_ONCE(list_empty(&iocg->active_list) && iocg->online); WARN_ON_ONCE(iocg->inuse > 1); iocg->abs_vdebt -= min(abs_vpay, iocg->abs_vdebt); -- 2.39.2

2 1

[PATCH OLK-5.10] blk-iocost: do not WARN if iocg was already offlined
by Li Nan 08 Jun '24

08 Jun '24

mainline inclusion from mainline-v6.9-rc5 commit 01bc4fda9ea0a6b52f12326486f07a4910666cf6 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9UABH CVE: CVE-2024-36908 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- In iocg_pay_debt(), warn is triggered if 'active_list' is empty, which is intended to confirm iocg is active when it has debt. However, warn can be triggered during a blkcg or disk removal, if iocg_waitq_timer_fn() is run at that time: WARNING: CPU: 0 PID: 2344971 at block/blk-iocost.c:1402 iocg_pay_debt+0x14c/0x190 Call trace: iocg_pay_debt+0x14c/0x190 iocg_kick_waitq+0x438/0x4c0 iocg_waitq_timer_fn+0xd8/0x130 __run_hrtimer+0x144/0x45c __hrtimer_run_queues+0x16c/0x244 hrtimer_interrupt+0x2cc/0x7b0 The warn in this situation is meaningless. Since this iocg is being removed, the state of the 'active_list' is irrelevant, and 'waitq_timer' is canceled after removing 'active_list' in ioc_pd_free(), which ensures iocg is freed after iocg_waitq_timer_fn() returns. Therefore, add the check if iocg was already offlined to avoid warn when removing a blkcg or disk. Signed-off-by: Li Nan <linan122(a)huawei.com> Reviewed-by: Yu Kuai <yukuai3(a)huawei.com> Acked-by: Tejun Heo <tj(a)kernel.org> Link: https://lore.kernel.org/r/20240419093257.3004211-1-linan666@huaweicloud.com Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Conflict: block/blk-iocost.c Mainline checks 'pd.online', but there is no member online of pd in 5.10. Check 'iocg->online' instead. Signed-off-by: Li Nan <linan122(a)huawei.com> --- block/blk-iocost.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/block/blk-iocost.c b/block/blk-iocost.c index edab843d547a..2b3bbae302e2 100644 --- a/block/blk-iocost.c +++ b/block/blk-iocost.c @@ -1401,8 +1401,11 @@ static void iocg_pay_debt(struct ioc_gq *iocg, u64 abs_vpay, lockdep_assert_held(&iocg->ioc->lock); lockdep_assert_held(&iocg->waitq.lock); - /* make sure that nobody messed with @iocg */ - WARN_ON_ONCE(list_empty(&iocg->active_list)); + /* + * make sure that nobody messed with @iocg. Check iocg->online + * to avoid warn when removing blkcg or disk. + */ + WARN_ON_ONCE(list_empty(&iocg->active_list) && iocg->online); WARN_ON_ONCE(iocg->inuse > 1); iocg->abs_vdebt -= min(abs_vpay, iocg->abs_vdebt); -- 2.39.2

2 1

[PATCH openEuler-22.03-LTS-SP1] net/smc: avoid data corruption caused by decline
by Liu Jian 08 Jun '24

08 Jun '24

From: "D. Wythe" <alibuda(a)linux.alibaba.com> stable inclusion from stable-v5.10.203 commit 5ada292b5c504720a0acef8cae9acc62a694d19c category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9R4L7 CVE: CVE-2023-52775 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit e6d71b437abc2f249e3b6a1ae1a7228e09c6e563 ] We found a data corruption issue during testing of SMC-R on Redis applications. The benchmark has a low probability of reporting a strange error as shown below. "Error: Protocol error, got "\xe2" as reply type byte" Finally, we found that the retrieved error data was as follows: 0xE2 0xD4 0xC3 0xD9 0x04 0x00 0x2C 0x20 0xA6 0x56 0x00 0x16 0x3E 0x0C 0xCB 0x04 0x02 0x01 0x00 0x00 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0xE2 It is quite obvious that this is a SMC DECLINE message, which means that the applications received SMC protocol message. We found that this was caused by the following situations: client server ¦ clc proposal -------------> ¦ clc accept <------------- ¦ clc confirm -------------> wait llc confirm send llc confirm ¦failed llc confirm ¦ x------ (after 2s)timeout wait llc confirm rsp wait decline (after 1s) timeout (after 2s) timeout ¦ decline --------------> ¦ decline <-------------- As a result, a decline message was sent in the implementation, and this message was read from TCP by the already-fallback connection. This patch double the client timeout as 2x of the server value, With this simple change, the Decline messages should never cross or collide (during Confirm link timeout). This issue requires an immediate solution, since the protocol updates involve a more long-term solution. Fixes: 0fb0b02bd6fd ("net/smc: adapt SMC client code to use the LLC flow") Signed-off-by: D. Wythe <alibuda(a)linux.alibaba.com> Reviewed-by: Wen Gu <guwen(a)linux.alibaba.com> Reviewed-by: Wenjia Zhang <wenjia(a)linux.ibm.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: sanglipeng <sanglipeng1(a)jd.com> Signed-off-by: Liu Jian <liujian56(a)huawei.com> --- net/smc/af_smc.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 41cbc7c89c9d..f277e5fc8056 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -396,8 +396,12 @@ static int smcr_clnt_conf_first_link(struct smc_sock *smc) struct smc_llc_qentry *qentry; int rc; - /* receive CONFIRM LINK request from server over RoCE fabric */ - qentry = smc_llc_wait(link->lgr, NULL, SMC_LLC_WAIT_TIME, + /* Receive CONFIRM LINK request from server over RoCE fabric. + * Increasing the client's timeout by twice as much as the server's + * timeout by default can temporarily avoid decline messages of + * both sides crossing or colliding + */ + qentry = smc_llc_wait(link->lgr, NULL, 2 * SMC_LLC_WAIT_TIME, SMC_LLC_CONFIRM_LINK); if (!qentry) { struct smc_clc_msg_decline dclc; -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS] PCI/PM: Drain runtime-idle callbacks before driver removal
by Jialin Zhang 08 Jun '24

08 Jun '24

From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com> stable inclusion from stable-v4.19.312 commit 9a87375bb586515c0af63d5dcdcd58ec4acf20a6 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9Q91J CVE: CVE-2024-35809 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 9d5286d4e7f68beab450deddbb6a32edd5ecf4bf ] A race condition between the .runtime_idle() callback and the .remove() callback in the rtsx_pcr PCI driver leads to a kernel crash due to an unhandled page fault [1]. The problem is that rtsx_pci_runtime_idle() is not expected to be running after pm_runtime_get_sync() has been called, but the latter doesn't really guarantee that. It only guarantees that the suspend and resume callbacks will not be running when it returns. However, if a .runtime_idle() callback is already running when pm_runtime_get_sync() is called, the latter will notice that the runtime PM status of the device is RPM_ACTIVE and it will return right away without waiting for the former to complete. In fact, it cannot wait for .runtime_idle() to complete because it may be called from that callback (it arguably does not make much sense to do that, but it is not strictly prohibited). Thus in general, whoever is providing a .runtime_idle() callback needs to protect it from running in parallel with whatever code runs after pm_runtime_get_sync(). [Note that .runtime_idle() will not start after pm_runtime_get_sync() has returned, but it may continue running then if it has started earlier.] One way to address that race condition is to call pm_runtime_barrier() after pm_runtime_get_sync() (not before it, because a nonzero value of the runtime PM usage counter is necessary to prevent runtime PM callbacks from being invoked) to wait for the .runtime_idle() callback to complete should it be running at that point. A suitable place for doing that is in pci_device_remove() which calls pm_runtime_get_sync() before removing the driver, so it may as well call pm_runtime_barrier() subsequently, which will prevent the race in question from occurring, not just in the rtsx_pcr driver, but in any PCI drivers providing .runtime_idle() callbacks. Link: https://lore.kernel.org/lkml/20240229062201.49500-1-kai.heng.feng@canonical… # [1] Link: https://lore.kernel.org/r/5761426.DvuYhMxLoT@kreacher Reported-by: Kai-Heng Feng <kai.heng.feng(a)canonical.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com> Tested-by: Ricky Wu <ricky_wu(a)realtek.com> Acked-by: Kai-Heng Feng <kai.heng.feng(a)canonical.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Conflicts: drivers/pci/pci-driver.c [The conflict occurs because commit 064300ccb0e2("PCI: Drop pci_device_remove() test of pci_dev->driver") is not merged] Signed-off-by: Jialin Zhang <zhangjialin11(a)huawei.com> --- drivers/pci/pci-driver.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 9329f4881d42..dc33bad9f106 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -442,6 +442,13 @@ static int pci_device_remove(struct device *dev) if (drv) { if (drv->remove) { pm_runtime_get_sync(dev); + /* + * If the driver provides a .runtime_idle() callback and it has + * started to run already, it may continue to run in parallel + * with the code below, so wait until all of the runtime PM + * activity has completed. + */ + pm_runtime_barrier(dev); drv->remove(pci_dev); pm_runtime_put_noidle(dev); } -- 2.25.1

2 1

[PATCH openEuler-22.03-LTS-SP1] PCI/PM: Drain runtime-idle callbacks before driver removal
by Jialin Zhang 08 Jun '24

08 Jun '24

From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com> stable inclusion from stable-v5.10.215 commit bbe068b24409ef740657215605284fc7cdddd491 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9Q91J CVE: CVE-2024-35809 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 9d5286d4e7f68beab450deddbb6a32edd5ecf4bf ] A race condition between the .runtime_idle() callback and the .remove() callback in the rtsx_pcr PCI driver leads to a kernel crash due to an unhandled page fault [1]. The problem is that rtsx_pci_runtime_idle() is not expected to be running after pm_runtime_get_sync() has been called, but the latter doesn't really guarantee that. It only guarantees that the suspend and resume callbacks will not be running when it returns. However, if a .runtime_idle() callback is already running when pm_runtime_get_sync() is called, the latter will notice that the runtime PM status of the device is RPM_ACTIVE and it will return right away without waiting for the former to complete. In fact, it cannot wait for .runtime_idle() to complete because it may be called from that callback (it arguably does not make much sense to do that, but it is not strictly prohibited). Thus in general, whoever is providing a .runtime_idle() callback needs to protect it from running in parallel with whatever code runs after pm_runtime_get_sync(). [Note that .runtime_idle() will not start after pm_runtime_get_sync() has returned, but it may continue running then if it has started earlier.] One way to address that race condition is to call pm_runtime_barrier() after pm_runtime_get_sync() (not before it, because a nonzero value of the runtime PM usage counter is necessary to prevent runtime PM callbacks from being invoked) to wait for the .runtime_idle() callback to complete should it be running at that point. A suitable place for doing that is in pci_device_remove() which calls pm_runtime_get_sync() before removing the driver, so it may as well call pm_runtime_barrier() subsequently, which will prevent the race in question from occurring, not just in the rtsx_pcr driver, but in any PCI drivers providing .runtime_idle() callbacks. Link: https://lore.kernel.org/lkml/20240229062201.49500-1-kai.heng.feng@canonical… # [1] Link: https://lore.kernel.org/r/5761426.DvuYhMxLoT@kreacher Reported-by: Kai-Heng Feng <kai.heng.feng(a)canonical.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com> Tested-by: Ricky Wu <ricky_wu(a)realtek.com> Acked-by: Kai-Heng Feng <kai.heng.feng(a)canonical.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Conflicts: drivers/pci/pci-driver.c [The conflict occurs because commit 39f7310eaa79("PCI: Drop pci_device_remove() test of pci_dev->driver") is not merged] Signed-off-by: Jialin Zhang <zhangjialin11(a)huawei.com> --- drivers/pci/pci-driver.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 79aa5a4b61c5..b740a7dc7fcd 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -447,6 +447,13 @@ static int pci_device_remove(struct device *dev) if (drv) { if (drv->remove) { pm_runtime_get_sync(dev); + /* + * If the driver provides a .runtime_idle() callback and it has + * started to run already, it may continue to run in parallel + * with the code below, so wait until all of the runtime PM + * activity has completed. + */ + pm_runtime_barrier(dev); drv->remove(pci_dev); pm_runtime_put_noidle(dev); } -- 2.25.1

2 1

[PATCH OLK-5.10] PCI/PM: Drain runtime-idle callbacks before driver removal
by Jialin Zhang 08 Jun '24

08 Jun '24

From: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com> stable inclusion from stable-v5.10.215 commit bbe068b24409ef740657215605284fc7cdddd491 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9Q91J CVE: CVE-2024-35809 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 9d5286d4e7f68beab450deddbb6a32edd5ecf4bf ] A race condition between the .runtime_idle() callback and the .remove() callback in the rtsx_pcr PCI driver leads to a kernel crash due to an unhandled page fault [1]. The problem is that rtsx_pci_runtime_idle() is not expected to be running after pm_runtime_get_sync() has been called, but the latter doesn't really guarantee that. It only guarantees that the suspend and resume callbacks will not be running when it returns. However, if a .runtime_idle() callback is already running when pm_runtime_get_sync() is called, the latter will notice that the runtime PM status of the device is RPM_ACTIVE and it will return right away without waiting for the former to complete. In fact, it cannot wait for .runtime_idle() to complete because it may be called from that callback (it arguably does not make much sense to do that, but it is not strictly prohibited). Thus in general, whoever is providing a .runtime_idle() callback needs to protect it from running in parallel with whatever code runs after pm_runtime_get_sync(). [Note that .runtime_idle() will not start after pm_runtime_get_sync() has returned, but it may continue running then if it has started earlier.] One way to address that race condition is to call pm_runtime_barrier() after pm_runtime_get_sync() (not before it, because a nonzero value of the runtime PM usage counter is necessary to prevent runtime PM callbacks from being invoked) to wait for the .runtime_idle() callback to complete should it be running at that point. A suitable place for doing that is in pci_device_remove() which calls pm_runtime_get_sync() before removing the driver, so it may as well call pm_runtime_barrier() subsequently, which will prevent the race in question from occurring, not just in the rtsx_pcr driver, but in any PCI drivers providing .runtime_idle() callbacks. Link: https://lore.kernel.org/lkml/20240229062201.49500-1-kai.heng.feng@canonical… # [1] Link: https://lore.kernel.org/r/5761426.DvuYhMxLoT@kreacher Reported-by: Kai-Heng Feng <kai.heng.feng(a)canonical.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com> Tested-by: Ricky Wu <ricky_wu(a)realtek.com> Acked-by: Kai-Heng Feng <kai.heng.feng(a)canonical.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Conflicts: drivers/pci/pci-driver.c [The conflict occurs because commit 39f7310eaa79("PCI: Drop pci_device_remove() test of pci_dev->driver") is not merged] Signed-off-by: Jialin Zhang <zhangjialin11(a)huawei.com> --- drivers/pci/pci-driver.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 6eaaf4231635..606e4a083337 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -447,6 +447,13 @@ static int pci_device_remove(struct device *dev) if (drv) { if (drv->remove) { pm_runtime_get_sync(dev); + /* + * If the driver provides a .runtime_idle() callback and it has + * started to run already, it may continue to run in parallel + * with the code below, so wait until all of the runtime PM + * activity has completed. + */ + pm_runtime_barrier(dev); drv->remove(pci_dev); pm_runtime_put_noidle(dev); } -- 2.25.1

2 1