Offering: HULK hulk inclusion category: bugfix bugzilla: 189748
--------------------------------
When offlining cpus, it holds cpu_hotplug_lock and call cpuset_hotplug_workfn asynchronously, which holds and releases cpuset_mutex repeatly to update cpusets, and it will release cpu_hotplug_lock before cpuset_hotplug_workfn finish. It means that some interfaces like cpuset_write_resmask holding two locks may rebuild scheduler domains when some cpusets are not refreshed, which may lead to generate domains with offlining cpus and will panic.
As commit 406100f3da08 ("cpuset: fix race between hotplug work and later CPU offline") mentioned. This problem happen in cgroup v2:
This problem can also happen in cgroup v1 pressure test, which onlines and offlines cpus, and sets cpuset.cpus to rebuild domains with sched_load_balance off.
CPU: 16 PID: 2815 Comm: bash Not tainted 4.19.90+ #14 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-4 RIP: 0010:build_sched_domains+0x3c5/0xe50 Code: 10 c7 43 48 9a 04 00 00 89 4b 50 f6 c4 02 74 28 48 63 ce 48 8b 45 20 48 8b 0c cd 0 RSP: 0018:ffffc90001e13d58 EFLAGS: 00000202 RAX: 2eeb0d75c0854802 RBX: ffff888103af4600 RCX: ffffffff81046d20 RDX: 0000000000000040 RSI: 0000000000000040 RDI: ffff888103af4738 RBP: ffff88810162c600 R08: ffff888103c858a0 R09: 00000000fffd7bac R10: 00000000fffd7bac R11: 0000000000000401 R12: 0000000000000002 R13: ffff888103c858a0 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f70a5bf6740(0000) GS:ffff888237800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffc660b0020 CR3: 000000010d716000 CR4: 00000000000006e0 Call Trace: partition_sched_domains+0x240/0x31a cpuset_write_resmask+0x51c/0x770 kernfs_fop_write+0xf8/0x180 vfs_write+0xaf/0x190 ksys_write+0x52/0xc0 do_syscall_64+0x47/0x170 entry_SYSCALL_64_after_hwframe+0x5c/0xc1
It must guarantee that cpus in domains passing to partition_and_rebuild_sched_domains must be active. So the domains should be checked after generate_sched_domains.
Fixes: 388afd8549dc ("cpuset: remove async hotplug propagation work") Signed-off-by: Chen Ridong chenridong@huawei.com --- kernel/cgroup/cpuset.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index ae2b1ad23607..4869922f8f02 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -916,6 +916,7 @@ static void rebuild_sched_domains_locked(void) struct sched_domain_attr *attr; cpumask_var_t *doms; int ndoms; + int i;
lockdep_assert_cpus_held(); lockdep_assert_held(&cpuset_mutex); @@ -931,6 +932,12 @@ static void rebuild_sched_domains_locked(void) /* Generate domain masks and attrs */ ndoms = generate_sched_domains(&doms, &attr);
+ /* guarantee no CPU offlining in doms */ + for (i = 0; i < ndoms; ++i) { + if (doms && !cpumask_subset(doms[i], cpu_active_mask)) + return; + } + /* Have scheduler rebuild the domains */ partition_sched_domains(ndoms, doms, attr); }