From: David Vernet void@manifault.com
mainline inclusion from mainline-v5.17-rc1 commit f5bdb34bf0c9314548f2d8e2360b703ff3610303 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60MYE CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When initializing a 'struct klp_object' in klp_init_object_loaded(), and performing relocations in klp_resolve_symbols(), klp_find_object_symbol() is invoked to look up the address of a symbol in an already-loaded module (or vmlinux). This, in turn, calls kallsyms_on_each_symbol() or module_kallsyms_on_each_symbol() to find the address of the symbol that is being patched.
It turns out that symbol lookups often take up the most CPU time when enabling and disabling a patch, and may hog the CPU and cause other tasks on that CPU's runqueue to starve -- even in paths where interrupts are enabled. For example, under certain workloads, enabling a KLP patch with many objects or functions may cause ksoftirqd to be starved, and thus for interrupts to be backlogged and delayed. This may end up causing TCP retransmits on the host where the KLP patch is being applied, and in general, may cause any interrupts serviced by softirqd to be delayed while the patch is being applied.
So as to ensure that kallsyms_on_each_symbol() does not end up hogging the CPU, this patch adds a call to cond_resched() in kallsyms_on_each_symbol() and module_kallsyms_on_each_symbol(), which are invoked when doing a symbol lookup in vmlinux and a module respectively. Without this patch, if a live-patch is applied on a 36-core Intel host with heavy TCP traffic, a ~10x spike is observed in TCP retransmits while the patch is being applied. Additionally, collecting sched events with perf indicates that ksoftirqd is awakened ~1.3 seconds before it's eventually scheduled. With the patch, no increase in TCP retransmit events is observed, and ksoftirqd is scheduled shortly after it's awakened.
Signed-off-by: David Vernet void@manifault.com Acked-by: Miroslav Benes mbenes@suse.cz Acked-by: Song Liu song@kernel.org Signed-off-by: Petr Mladek pmladek@suse.com Link: https://lore.kernel.org/r/20211229215646.830451-1-void@manifault.com Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Reviewed-by: Kuohai Xu xukuohai@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/kallsyms.c | 1 + kernel/module.c | 2 ++ 2 files changed, 3 insertions(+)
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index fe9de067771c..c6738525fe11 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -191,6 +191,7 @@ int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, ret = fn(data, namebuf, NULL, kallsyms_sym_address(i)); if (ret != 0) return ret; + cond_resched(); } return module_kallsyms_on_each_symbol(fn, data); } diff --git a/kernel/module.c b/kernel/module.c index cfa3d8c370a8..00aabcd30e4e 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -4484,6 +4484,8 @@ int module_kallsyms_on_each_symbol(int (*fn)(void *, const char *, mod, kallsyms_symbol_value(sym)); if (ret != 0) return ret; + + cond_resched(); } } return 0;