From: Nikolay Aleksandrov razor@blackwall.org
mainline inclusion from mainline-v6.3-rc3 commit 9ec7eb60dcbcb6c41076defbc5df7bbd95ceaba5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6WNGK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
---------------------------
Add bond_ether_setup helper which is used to fix ether_setup() calls in the bonding driver. It takes care of both IFF_MASTER and IFF_SLAVE flags, the former is always restored and the latter only if it was set. If the bond enslaves non-ARPHRD_ETHER device (changes its type), then releases it and enslaves ARPHRD_ETHER device (changes back) then we use ether_setup() to restore the bond device type but it also resets its flags and removes IFF_MASTER and IFF_SLAVE[1]. Use the bond_ether_setup helper to restore both after such transition.
[1] reproduce (nlmon is non-ARPHRD_ETHER): $ ip l add nlmon0 type nlmon $ ip l add bond2 type bond mode active-backup $ ip l set nlmon0 master bond2 $ ip l set nlmon0 nomaster $ ip l add bond1 type bond (we use bond1 as ARPHRD_ETHER device to restore bond2's mode) $ ip l set bond1 master bond2 $ ip l sh dev bond2 37: bond2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether be:d7:c5:40:5b:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 1500 (notice bond2's IFF_MASTER is missing)
Fixes: e36b9d16c6a6 ("bonding: clean muticast addresses when device changes type") Signed-off-by: Nikolay Aleksandrov razor@blackwall.org Signed-off-by: David S. Miller davem@davemloft.net Conflicts: drivers/net/bonding/bond_main.c Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/net/bonding/bond_main.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 02965c5c1ed2..b0ecbb293bbf 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1391,6 +1391,19 @@ void bond_lower_state_changed(struct slave *slave) netdev_lower_state_changed(slave->dev, &info); }
+/* The bonding driver uses ether_setup() to convert a master bond device + * to ARPHRD_ETHER, that resets the target netdevice's flags so we always + * have to restore the IFF_MASTER flag, and only restore IFF_SLAVE if it was set + */ +static void bond_ether_setup(struct net_device *bond_dev) +{ + unsigned int slave_flag = bond_dev->flags & IFF_SLAVE; + + ether_setup(bond_dev); + bond_dev->flags |= IFF_MASTER | slave_flag; + bond_dev->priv_flags &= ~IFF_TX_SKB_SHARING; +} + /* enslave device <slave> to bond device <master> */ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, struct netlink_ext_ack *extack) @@ -1481,10 +1494,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
if (slave_dev->type != ARPHRD_ETHER) bond_setup_by_slave(bond_dev, slave_dev); - else { - ether_setup(bond_dev); - bond_dev->priv_flags &= ~IFF_TX_SKB_SHARING; - } + else + bond_ether_setup(bond_dev);
call_netdevice_notifiers(NETDEV_POST_TYPE_CHANGE, bond_dev);
From: Nikolay Aleksandrov razor@blackwall.org
mainline inclusion from mainline-v6.3-rc3 commit e667d469098671261d558be0cd93dca4d285ce1e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6WNGK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
---------------------------
syzbot reported a warning[1] where the bond device itself is a slave and we try to enslave a non-ethernet device as the first slave which fails but then in the error path when ether_setup() restores the bond device it also clears all flags. In my previous fix[2] I restored the IFF_MASTER flag, but I didn't consider the case that the bond device itself might also be a slave with IFF_SLAVE set, so we need to restore that flag as well. Use the bond_ether_setup helper which does the right thing and restores the bond's flags properly.
Steps to reproduce using a nlmon dev: $ ip l add nlmon0 type nlmon $ ip l add bond1 type bond $ ip l add bond2 type bond $ ip l set bond1 master bond2 $ ip l set dev nlmon0 master bond1 $ ip -d l sh dev bond1 22: bond1: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noqueue master bond2 state DOWN mode DEFAULT group default qlen 1000 (now bond1's IFF_SLAVE flag is gone and we'll hit a warning[3] if we try to delete it)
[1] https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3e... [2] commit 7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure") [3] example warning: [ 27.008664] bond1: (slave nlmon0): The slave device specified does not support setting the MAC address [ 27.008692] bond1: (slave nlmon0): Error -95 calling set_mac_address [ 32.464639] bond1 (unregistering): Released all slaves [ 32.464685] ------------[ cut here ]------------ [ 32.464686] WARNING: CPU: 1 PID: 2004 at net/core/dev.c:10829 unregister_netdevice_many+0x72a/0x780 [ 32.464694] Modules linked in: br_netfilter bridge bonding virtio_net [ 32.464699] CPU: 1 PID: 2004 Comm: ip Kdump: loaded Not tainted 5.18.0-rc3+ #47 [ 32.464703] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.1-2.fc37 04/01/2014 [ 32.464704] RIP: 0010:unregister_netdevice_many+0x72a/0x780 [ 32.464707] Code: 99 fd ff ff ba 90 1a 00 00 48 c7 c6 f4 02 66 96 48 c7 c7 20 4d 35 96 c6 05 fa c7 2b 02 01 e8 be 6f 4a 00 0f 0b e9 73 fd ff ff <0f> 0b e9 5f fd ff ff 80 3d e3 c7 2b 02 00 0f 85 3b fd ff ff ba 59 [ 32.464710] RSP: 0018:ffffa006422d7820 EFLAGS: 00010206 [ 32.464712] RAX: ffff8f6e077140a0 RBX: ffffa006422d7888 RCX: 0000000000000000 [ 32.464714] RDX: ffff8f6e12edbe58 RSI: 0000000000000296 RDI: ffffffff96d4a520 [ 32.464716] RBP: ffff8f6e07714000 R08: ffffffff96d63600 R09: ffffa006422d7728 [ 32.464717] R10: 0000000000000ec0 R11: ffffffff9698c988 R12: ffff8f6e12edb140 [ 32.464719] R13: dead000000000122 R14: dead000000000100 R15: ffff8f6e12edb140 [ 32.464723] FS: 00007f297c2f1740(0000) GS:ffff8f6e5d900000(0000) knlGS:0000000000000000 [ 32.464725] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 32.464726] CR2: 00007f297bf1c800 CR3: 00000000115e8000 CR4: 0000000000350ee0 [ 32.464730] Call Trace: [ 32.464763] <TASK> [ 32.464767] rtnl_dellink+0x13e/0x380 [ 32.464776] ? cred_has_capability.isra.0+0x68/0x100 [ 32.464780] ? __rtnl_unlock+0x33/0x60 [ 32.464783] ? bpf_lsm_capset+0x10/0x10 [ 32.464786] ? security_capable+0x36/0x50 [ 32.464790] rtnetlink_rcv_msg+0x14e/0x3b0 [ 32.464792] ? _copy_to_iter+0xb1/0x790 [ 32.464796] ? post_alloc_hook+0xa0/0x160 [ 32.464799] ? rtnl_calcit.isra.0+0x110/0x110 [ 32.464802] netlink_rcv_skb+0x50/0xf0 [ 32.464806] netlink_unicast+0x216/0x340 [ 32.464809] netlink_sendmsg+0x23f/0x480 [ 32.464812] sock_sendmsg+0x5e/0x60 [ 32.464815] ____sys_sendmsg+0x22c/0x270 [ 32.464818] ? import_iovec+0x17/0x20 [ 32.464821] ? sendmsg_copy_msghdr+0x59/0x90 [ 32.464823] ? do_set_pte+0xa0/0xe0 [ 32.464828] ___sys_sendmsg+0x81/0xc0 [ 32.464832] ? mod_objcg_state+0xc6/0x300 [ 32.464835] ? refill_obj_stock+0xa9/0x160 [ 32.464838] ? memcg_slab_free_hook+0x1a5/0x1f0 [ 32.464842] __sys_sendmsg+0x49/0x80 [ 32.464847] do_syscall_64+0x3b/0x90 [ 32.464851] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 32.464865] RIP: 0033:0x7f297bf2e5e7 [ 32.464868] Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [ 32.464869] RSP: 002b:00007ffd96c824c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 32.464872] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f297bf2e5e7 [ 32.464874] RDX: 0000000000000000 RSI: 00007ffd96c82540 RDI: 0000000000000003 [ 32.464875] RBP: 00000000640f19de R08: 0000000000000001 R09: 000000000000007c [ 32.464876] R10: 00007f297bffabe0 R11: 0000000000000246 R12: 0000000000000001 [ 32.464877] R13: 00007ffd96c82d20 R14: 00007ffd96c82610 R15: 000055bfe38a7020 [ 32.464881] </TASK> [ 32.464882] ---[ end trace 0000000000000000 ]---
Fixes: 7d5cd2ce5292 ("bonding: correctly handle bonding type change on enslave failure") Reported-by: syzbot+9dfc3f3348729cc82277@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=391c7b1f6522182899efba27d891f1743e8eb3e... Signed-off-by: Nikolay Aleksandrov razor@blackwall.org Reviewed-by: Michal Kubiak michal.kubiak@intel.com Acked-by: Jonathan Toppins jtoppins@redhat.com Acked-by: Jay Vosburgh jay.vosburgh@canonical.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- drivers/net/bonding/bond_main.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index b0ecbb293bbf..08939a35a187 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1884,9 +1884,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, eth_hw_addr_random(bond_dev); if (bond_dev->type != ARPHRD_ETHER) { dev_close(bond_dev); - ether_setup(bond_dev); - bond_dev->flags |= IFF_MASTER; - bond_dev->priv_flags &= ~IFF_TX_SKB_SHARING; + bond_ether_setup(bond_dev); } }
From: Frederic Weisbecker frederic@kernel.org
mainline inclusion from mainline-v5.16-rc4 commit 53e87e3cdc155f20c3417b689df8d2ac88d79576 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I6WCC1 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When at least one CPU runs in nohz_full mode, a dedicated timekeeper CPU is guaranteed to stay online and to never stop its tick.
Meanwhile on some rare case, the dedicated timekeeper may be running with interrupts disabled for a while, such as in stop_machine.
If jiffies stop being updated, a nohz_full CPU may end up endlessly programming the next tick in the past, taking the last jiffies update monotonic timestamp as a stale base, resulting in an tick storm.
Here is a scenario where it matters:
0) CPU 0 is the timekeeper and CPU 1 a nohz_full CPU.
1) A stop machine callback is queued to execute somewhere.
2) CPU 0 reaches MULTI_STOP_DISABLE_IRQ while CPU 1 is still in MULTI_STOP_PREPARE. Hence CPU 0 can't do its timekeeping duty. CPU 1 can still take IRQs.
3) CPU 1 receives an IRQ which queues a timer callback one jiffy forward.
4) On IRQ exit, CPU 1 schedules the tick one jiffy forward, taking last_jiffies_update as a base. But last_jiffies_update hasn't been updated for 2 jiffies since the timekeeper has interrupts disabled.
5) clockevents_program_event(), which relies on ktime_get(), observes that the expiration is in the past and therefore programs the min delta event on the clock.
6) The tick fires immediately, goto 3)
7) Tick storm, the nohz_full CPU is drown and takes ages to reach MULTI_STOP_DISABLE_IRQ, which is the only way out of this situation.
Solve this with unconditionally updating jiffies if the value is stale on nohz_full IRQ entry. IRQs and other disturbances are expected to be rare enough on nohz_full for the unconditional call to ktime_get() to actually matter.
Reported-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Paul E. McKenney paulmck@kernel.org Link: https://lore.kernel.org/r/20211026141055.57358-2-frederic@kernel.org
Conflicts: kernel/softirq.c
Signed-off-by: Yu Liao liaoyu15@huawei.com Reviewed-by: Xiongfeng Wang wangxiongfeng2@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- kernel/softirq.c | 3 ++- kernel/time/tick-sched.c | 7 +++++++ 2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/kernel/softirq.c b/kernel/softirq.c index 4daab24bd4e2..99a047f70fd2 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -344,7 +344,8 @@ asmlinkage __visible void do_softirq(void) void irq_enter(void) { rcu_irq_enter(); - if (is_idle_task(current) && !in_interrupt()) { + if (tick_nohz_full_cpu(smp_processor_id()) || + (is_idle_task(current) && !in_interrupt())) { /* * Prevent raise_softirq from needlessly waking up ksoftirqd * here, as softirq will be serviced on return from interrupt. diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 5b33e2f5c0ed..03f940089431 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -1228,6 +1228,13 @@ static inline void tick_nohz_irq_enter(void) now = ktime_get(); if (ts->idle_active) tick_nohz_stop_idle(ts, now); + /* + * If all CPUs are idle. We may need to update a stale jiffies value. + * Note nohz_full is a special case: a timekeeper is guaranteed to stay + * alive but it might be busy looping with interrupts disabled in some + * rare case (typically stop machine). So we must make sure we have a + * last resort. + */ if (ts->tick_stopped) tick_nohz_update_jiffies(now); }