[PATCH OLK-6.6 000/140] *** sched-ext-bpf-part ***
*** BLURB HERE *** Alan Maguire (16): kbuild,bpf: Switch to using --btf_features for pahole v1.26 and later kbuild, bpf: Use test-ge check for v1.25-only pahole libbpf: Add btf__distill_base() creating split BTF with distilled base BTF selftests/bpf: Test distilled base, split BTF generation libbpf: Split BTF relocation selftests/bpf: Extend distilled BTF tests to cover BTF relocation resolve_btfids: Handle presence of .BTF.base section libbpf: BTF relocation followup fixing naming, loop logic module, bpf: Store BTF base pointer in struct module libbpf: Split field iter code into its own file kernel libbpf,bpf: Share BTF relocate-related code with kernel kbuild,bpf: Add module-specific pahole flags for distilled base BTF selftests/bpf: Add kfunc_call test for simple dtor in bpf_testmod bpf: fix build when CONFIG_DEBUG_INFO_BTF[_MODULES] is undefined libbpf: Fix error handling in btf__distill_base() libbpf: Fix license for btf_relocate.c Alexander Lobakin (1): bitops: make BYTES_TO_BITS() treewide-available Alexei Starovoitov (2): s390/bpf: Fix indirect trampoline generation bpf: Introduce "volatile compare" macros Andrii Nakryiko (13): bpf: Emit global subprog name in verifier logs bpf: Validate global subprogs lazily selftests/bpf: Add lazy global subprog validation tests libbpf: Add btf__new_split() API that was declared but not implemented bpf: move sleepable flag from bpf_prog_aux to bpf_prog libbpf: Add BTF field iterator libbpf: Make use of BTF field iterator in BPF linker code libbpf: Make use of BTF field iterator in BTF handling code bpftool: Use BTF field iterator in btfgen libbpf: Remove callback-based type/string BTF field visitor helpers bpf: extract iterator argument type and name validation logic bpf: allow passing struct bpf_iter_<type> as kfunc arguments selftests/bpf: test passing iterator to a kfunc Benjamin Tissoires (1): bpf: introduce in_sleepable() helper Christophe Leroy (2): bpf: Remove arch_unprotect_bpf_trampoline() bpf: Check return from set_memory_rox() Chuyi Zhou (13): cgroup: Prepare for using css_task_iter_*() in BPF bpf: Introduce css_task open-coded iterator kfuncs bpf: Introduce task open coded iterator kfuncs bpf: Introduce css open-coded iterator kfuncs bpf: teach the verifier to enforce css_iter and task_iter in RCU CS bpf: Let bpf_iter_task_new accept null task ptr selftests/bpf: rename bpf_iter_task.c to bpf_iter_tasks.c selftests/bpf: Add tests for open-coded task and css iter bpf: Relax allowlist for css_task iter selftests/bpf: Add tests for css_task iter combining with cgroup iter selftests/bpf: Add test for using css_task iter in sleepable progs bpf: Let verifier consider {task,cgroup} is trusted in bpf_iter_reg selftests/bpf: get trusted cgrp from bpf_iter__cgroup directly Daniel Xu (3): bpf: btf: Support flags for BTF_SET8 sets bpf: btf: Add BTF_KFUNCS_START/END macro pair bpf: treewide: Annotate BPF kfuncs in BTF Dave Marchevsky (6): bpf: Don't explicitly emit BTF for struct btf_iter_num selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c bpf: Introduce task_vma open-coded iterator kfuncs selftests/bpf: Add tests for open-coded task_vma iter bpf: Add __bpf_kfunc_{start,end}_defs macros bpf: Add __bpf_hook_{start,end} macros David Vernet (2): bpf: Add ability to pin bpf timer to calling CPU selftests/bpf: Test pinning bpf timer to a core Eduard Zingerman (2): libbpf: Make btf_parse_elf process .BTF.base transparently selftests/bpf: Check if distilled base inherits source endianness Geliang Tang (1): bpf, btf: Check btf for register_bpf_struct_ops Hou Tao (6): bpf: Free dynamically allocated bits in bpf_iter_bits_destroy() bpf: Add bpf_mem_alloc_check_size() helper bpf: Check the validity of nr_words in bpf_iter_bits_new() bpf: Use __u64 to save the bits in bits iterator selftests/bpf: Add three test cases for bits_iter selftests/bpf: Use -4095 as the bad address for bits iterator Kui-Feng Lee (29): bpf: refactory struct_ops type initialization to a function. bpf: get type information with BTF_ID_LIST bpf, net: introduce bpf_struct_ops_desc. bpf: add struct_ops_tab to btf. bpf: make struct_ops_map support btfs other than btf_vmlinux. bpf: pass btf object id in bpf_map_info. bpf: lookup struct_ops types from a given module BTF. bpf: pass attached BTF to the bpf_struct_ops subsystem bpf: hold module refcnt in bpf_struct_ops map creation and prog verification. bpf: validate value_type bpf, net: switch to dynamic registration libbpf: Find correct module BTFs for struct_ops maps and progs. bpf: export btf_ctx_access to modules. selftests/bpf: test case for register_bpf_struct_ops(). bpf: Fix error checks against bpf_get_btf_vmlinux(). bpf: Remove an unnecessary check. selftests/bpf: Suppress warning message of an unused variable. bpf: add btf pointer to struct bpf_ctx_arg_aux. bpf: Move __kfunc_param_match_suffix() to btf.c. bpf: Create argument information for nullable arguments. selftests/bpf: Test PTR_MAYBE_NULL arguments of struct_ops operators. libbpf: Set btf_value_type_id of struct bpf_map for struct_ops. libbpf: Convert st_ops->data to shadow type. bpftool: Generated shadow variables for struct_ops maps. bpftool: Add an example for struct_ops map and shadow type. selftests/bpf: Test if shadow types work correctly. bpf, net: validate struct_ops when updating value. bpf: struct_ops supports more than one page for trampolines. selftests/bpf: Test struct_ops maps with a large number of struct_ops program. Kumar Kartikeya Dwivedi (4): bpf: Allow calling static subprogs while holding a bpf_spin_lock selftests/bpf: Add test for static subprog call in lock cs bpf: Transfer RCU lock state between subprog calls selftests/bpf: Add tests for RCU lock transfer between subprogs Luo Gengkun (7): bpf: Fix kabi-breakage for bpf_func_info_aux bpf: Fix kabi-breakage for bpf_tramp_image bpf: Fix kabi for bpf_attr bpf_verifier: Fix kabi for bpf_verifier_env bpf: Fix kabi for bpf_ctx_arg_aux bpf: Fix kabi for bpf_prog_aux and bpf_prog selftests/bpf: modify test_loader that didn't support running bpf_prog_type_syscall programs Martin KaFai Lau (4): libbpf: Ensure undefined bpf_attr field stays 0 bpf: Remove unnecessary err < 0 check in bpf_struct_ops_map_update_elem bpf: Fix a crash when btf_parse_base() returns an error pointer bpf: Reject struct_ops registration that uses module ptr and the module btf_id is missing Masahiro Yamada (1): kbuild: avoid too many execution of scripts/pahole-flags.sh Matthieu Baerts (1): bpf: fix compilation error without CGROUPS Peter Zijlstra (6): cfi: Flip headers x86/cfi,bpf: Fix BPF JIT call x86/cfi,bpf: Fix bpf_callback_t CFI x86/cfi,bpf: Fix bpf_struct_ops CFI cfi: Add CFI_NOSEAL() bpf: Fix dtor CFI Pu Lehui (8): riscv, bpf: Fix unpredictable kernel crash about RV64 struct_ops bpf: Fix kabi breakage in struct module riscv, bpf: Fix out-of-bounds issue when preparing trampoline image selftests/bpf: Fix btf leak on new btf alloc failure in btf_distill test libbpf: Fix return zero when elf_begin failed libbpf: Fix incorrect traversal end type ID when marking BTF_IS_EMBEDDED selftests/bpf: Add distilled BTF test about marking BTF_IS_EMBEDDED selftests/bpf: Add file_read_pattern to gitignore Song Liu (8): bpf: Charge modmem for struct_ops trampoline bpf: Let bpf_prog_pack_free handle any pointer bpf: Adjust argument names of arch_prepare_bpf_trampoline() bpf: Add helpers for trampoline image management bpf, x86: Adjust arch_prepare_bpf_trampoline return value bpf: Add arch_bpf_trampoline_size() bpf: Use arch_bpf_trampoline_size x86, bpf: Use bpf_prog_pack for bpf trampoline T.J. Mercier (1): bpf, docs: Fix broken link to renamed bpf_iter_task_vmas.c Tony Ambardar (1): libbpf: Ensure new BTF objects inherit input endianness Yafang Shao (2): bpf: Add bits iterator selftests/bpf: Add selftest for bits iter Documentation/bpf/bpf_iterators.rst | 2 +- Documentation/bpf/kfuncs.rst | 14 +- MAINTAINERS | 2 +- Makefile | 4 +- arch/arm64/kernel/bpf-rvi.c | 4 +- arch/arm64/net/bpf_jit_comp.c | 55 +- arch/riscv/include/asm/cfi.h | 3 +- arch/riscv/kernel/cfi.c | 2 +- arch/riscv/net/bpf_jit_comp64.c | 48 +- arch/s390/net/bpf_jit_comp.c | 59 +- arch/x86/include/asm/cfi.h | 126 ++- arch/x86/kernel/alternative.c | 87 +- arch/x86/kernel/cfi.c | 4 +- arch/x86/net/bpf_jit_comp.c | 261 ++++-- block/blk-cgroup.c | 4 +- drivers/hid/bpf/hid_bpf_dispatch.c | 12 +- fs/proc/stat.c | 4 +- include/asm-generic/Kbuild | 1 + include/asm-generic/cfi.h | 5 + include/linux/bitops.h | 2 + include/linux/bpf.h | 130 ++- include/linux/bpf_mem_alloc.h | 3 + include/linux/bpf_verifier.h | 21 +- include/linux/btf.h | 105 +++ include/linux/btf_ids.h | 21 +- include/linux/cfi.h | 12 + include/linux/cgroup.h | 12 +- include/linux/filter.h | 2 +- include/linux/module.h | 5 + include/uapi/linux/bpf.h | 16 +- kernel/bpf-rvi/common_kfuncs.c | 4 +- kernel/bpf/Makefile | 8 +- kernel/bpf/bpf_iter.c | 12 +- kernel/bpf/bpf_struct_ops.c | 748 ++++++++++++------ kernel/bpf/bpf_struct_ops_types.h | 12 - kernel/bpf/btf.c | 431 ++++++++-- kernel/bpf/cgroup_iter.c | 65 +- kernel/bpf/core.c | 76 +- kernel/bpf/cpumask.c | 18 +- kernel/bpf/dispatcher.c | 7 +- kernel/bpf/helpers.c | 202 ++++- kernel/bpf/map_iter.c | 10 +- kernel/bpf/memalloc.c | 14 +- kernel/bpf/syscall.c | 12 +- kernel/bpf/task_iter.c | 242 +++++- kernel/bpf/trampoline.c | 99 ++- kernel/bpf/verifier.c | 317 ++++++-- kernel/cgroup/cgroup.c | 18 +- kernel/cgroup/cpuset.c | 4 +- kernel/cgroup/rstat.c | 13 +- kernel/events/core.c | 2 +- kernel/module/main.c | 5 +- kernel/sched/bpf_sched.c | 8 +- kernel/sched/cpuacct.c | 4 +- kernel/trace/bpf_trace.c | 12 +- kernel/trace/trace_probe.c | 2 - net/bpf/bpf_dummy_struct_ops.c | 72 +- net/bpf/test_run.c | 30 +- net/core/filter.c | 33 +- net/core/xdp.c | 10 +- net/ipv4/bpf_tcp_ca.c | 93 ++- net/ipv4/fou_bpf.c | 10 +- net/ipv4/tcp_bbr.c | 4 +- net/ipv4/tcp_cong.c | 6 +- net/ipv4/tcp_cubic.c | 4 +- net/ipv4/tcp_dctcp.c | 4 +- net/netfilter/nf_conntrack_bpf.c | 10 +- net/netfilter/nf_nat_bpf.c | 10 +- net/socket.c | 8 +- net/xfrm/xfrm_interface_bpf.c | 10 +- scripts/Makefile.btf | 33 + scripts/Makefile.modfinal | 2 +- scripts/pahole-flags.sh | 30 - .../bpf/bpftool/Documentation/bpftool-gen.rst | 58 +- tools/bpf/bpftool/gen.c | 253 +++++- tools/bpf/resolve_btfids/main.c | 8 + tools/include/linux/bitops.h | 2 + tools/include/uapi/linux/bpf.h | 14 +- tools/lib/bpf/Build | 2 +- tools/lib/bpf/bpf.c | 4 +- tools/lib/bpf/bpf.h | 4 +- tools/lib/bpf/btf.c | 704 ++++++++++++----- tools/lib/bpf/btf.h | 36 + tools/lib/bpf/btf_iter.c | 177 +++++ tools/lib/bpf/btf_relocate.c | 519 ++++++++++++ tools/lib/bpf/libbpf.c | 86 +- tools/lib/bpf/libbpf.map | 4 +- tools/lib/bpf/libbpf_internal.h | 29 +- tools/lib/bpf/libbpf_probes.c | 1 + tools/lib/bpf/linker.c | 58 +- tools/perf/util/probe-finder.c | 4 +- tools/testing/selftests/bpf/.gitignore | 1 + .../testing/selftests/bpf/bpf_experimental.h | 96 +++ .../selftests/bpf/bpf_testmod/bpf_testmod.c | 160 +++- .../selftests/bpf/bpf_testmod/bpf_testmod.h | 61 ++ .../bpf/bpf_testmod/bpf_testmod_kfunc.h | 9 + .../selftests/bpf/prog_tests/bpf_iter.c | 44 +- .../selftests/bpf/prog_tests/btf_distill.c | 692 ++++++++++++++++ .../selftests/bpf/prog_tests/cgroup_iter.c | 33 + .../testing/selftests/bpf/prog_tests/iters.c | 209 +++++ .../selftests/bpf/prog_tests/kfunc_call.c | 1 + .../selftests/bpf/prog_tests/rcu_read_lock.c | 6 + .../selftests/bpf/prog_tests/spin_lock.c | 2 + .../prog_tests/test_struct_ops_maybe_null.c | 46 ++ .../bpf/prog_tests/test_struct_ops_module.c | 86 ++ .../prog_tests/test_struct_ops_multi_pages.c | 30 + .../testing/selftests/bpf/prog_tests/timer.c | 4 + .../selftests/bpf/prog_tests/verifier.c | 4 + ...f_iter_task_vma.c => bpf_iter_task_vmas.c} | 0 .../{bpf_iter_task.c => bpf_iter_tasks.c} | 0 tools/testing/selftests/bpf/progs/iters_css.c | 72 ++ .../selftests/bpf/progs/iters_css_task.c | 102 +++ .../testing/selftests/bpf/progs/iters_task.c | 41 + .../selftests/bpf/progs/iters_task_failure.c | 105 +++ .../selftests/bpf/progs/iters_task_vma.c | 43 + .../selftests/bpf/progs/iters_testmod_seq.c | 50 ++ .../selftests/bpf/progs/kfunc_call_test.c | 37 + .../selftests/bpf/progs/rcu_read_lock.c | 120 +++ .../bpf/progs/struct_ops_maybe_null.c | 29 + .../bpf/progs/struct_ops_maybe_null_fail.c | 24 + .../selftests/bpf/progs/struct_ops_module.c | 37 + .../bpf/progs/struct_ops_multi_pages.c | 102 +++ .../selftests/bpf/progs/test_global_func12.c | 4 +- .../selftests/bpf/progs/test_spin_lock.c | 65 ++ .../selftests/bpf/progs/test_spin_lock_fail.c | 44 ++ tools/testing/selftests/bpf/progs/timer.c | 63 +- .../selftests/bpf/progs/verifier_bits_iter.c | 232 ++++++ .../bpf/progs/verifier_global_subprogs.c | 92 +++ .../selftests/bpf/progs/verifier_spin_lock.c | 2 +- .../bpf/progs/verifier_subprog_precision.c | 4 +- tools/testing/selftests/bpf/test_loader.c | 10 +- 131 files changed, 7326 insertions(+), 1139 deletions(-) create mode 100644 include/asm-generic/cfi.h delete mode 100644 kernel/bpf/bpf_struct_ops_types.h create mode 100644 scripts/Makefile.btf delete mode 100755 scripts/pahole-flags.sh create mode 100644 tools/lib/bpf/btf_iter.c create mode 100644 tools/lib/bpf/btf_relocate.c create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_distill.c create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_maybe_null.c create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_module.c create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_multi_pages.c rename tools/testing/selftests/bpf/progs/{bpf_iter_task_vma.c => bpf_iter_task_vmas.c} (100%) rename tools/testing/selftests/bpf/progs/{bpf_iter_task.c => bpf_iter_tasks.c} (100%) create mode 100644 tools/testing/selftests/bpf/progs/iters_css.c create mode 100644 tools/testing/selftests/bpf/progs/iters_css_task.c create mode 100644 tools/testing/selftests/bpf/progs/iters_task.c create mode 100644 tools/testing/selftests/bpf/progs/iters_task_failure.c create mode 100644 tools/testing/selftests/bpf/progs/iters_task_vma.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_maybe_null.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_maybe_null_fail.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_module.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_multi_pages.c create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c create mode 100644 tools/testing/selftests/bpf/progs/verifier_global_subprogs.c -- 2.34.1
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://atomgit.com/openeuler/kernel/merge_requests/21735 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/ZGZ... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://atomgit.com/openeuler/kernel/merge_requests/21735 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/ZGZ...
From: Song Liu <song@kernel.org> mainline inclusion from mainline-v6.7-rc1 commit 5c04433daf9ed8b28d4900112be1fd19e1786b25 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Current code charges modmem for regular trampoline, but not for struct_ops trampoline. Add bpf_jit_[charge|uncharge]_modmem() to struct_ops so the trampoline is charged in both cases. Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230914222542.2986059-1-song@kernel.org Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/bpf_struct_ops.c | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index fdc3e8705a3c..db6176fb64dc 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -615,7 +615,10 @@ static void __bpf_struct_ops_map_free(struct bpf_map *map) if (st_map->links) bpf_struct_ops_map_put_progs(st_map); bpf_map_area_free(st_map->links); - bpf_jit_free_exec(st_map->image); + if (st_map->image) { + bpf_jit_free_exec(st_map->image); + bpf_jit_uncharge_modmem(PAGE_SIZE); + } bpf_map_area_free(st_map->uvalue); bpf_map_area_free(st_map); } @@ -657,6 +660,7 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) struct bpf_struct_ops_map *st_map; const struct btf_type *t, *vt; struct bpf_map *map; + int ret; st_ops = bpf_struct_ops_find_value(attr->btf_vmlinux_value_type_id); if (!st_ops) @@ -681,12 +685,27 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) st_map->st_ops = st_ops; map = &st_map->map; + ret = bpf_jit_charge_modmem(PAGE_SIZE); + if (ret) { + __bpf_struct_ops_map_free(map); + return ERR_PTR(ret); + } + + st_map->image = bpf_jit_alloc_exec(PAGE_SIZE); + if (!st_map->image) { + /* __bpf_struct_ops_map_free() uses st_map->image as flag + * for "charged or not". In this case, we need to unchange + * here. + */ + bpf_jit_uncharge_modmem(PAGE_SIZE); + __bpf_struct_ops_map_free(map); + return ERR_PTR(-ENOMEM); + } st_map->uvalue = bpf_map_area_alloc(vt->size, NUMA_NO_NODE); st_map->links = bpf_map_area_alloc(btf_type_vlen(t) * sizeof(struct bpf_links *), NUMA_NO_NODE); - st_map->image = bpf_jit_alloc_exec(PAGE_SIZE); - if (!st_map->uvalue || !st_map->links || !st_map->image) { + if (!st_map->uvalue || !st_map->links) { __bpf_struct_ops_map_free(map); return ERR_PTR(-ENOMEM); } @@ -907,4 +926,3 @@ int bpf_struct_ops_link_create(union bpf_attr *attr) kfree(link); return err; } - -- 2.34.1
From: David Vernet <void@manifault.com> mainline inclusion from mainline-v6.7-rc1 commit d6247ecb6c1e17d7a33317090627f5bfe563cbb2 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- BPF supports creating high resolution timers using bpf_timer_* helper functions. Currently, only the BPF_F_TIMER_ABS flag is supported, which specifies that the timeout should be interpreted as absolute time. It would also be useful to be able to pin that timer to a core. For example, if you wanted to make a subset of cores run without timer interrupts, and only have the timer be invoked on a single core. This patch adds support for this with a new BPF_F_TIMER_CPU_PIN flag. When specified, the HRTIMER_MODE_PINNED flag is passed to hrtimer_start(). A subsequent patch will update selftests to validate. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <song@kernel.org> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20231004162339.200702-2-void@manifault.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/uapi/linux/bpf.h | 4 ++++ kernel/bpf/helpers.c | 5 ++++- tools/include/uapi/linux/bpf.h | 4 ++++ 3 files changed, 12 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 5aca37b62f79..31e5b8061fe5 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5124,6 +5124,8 @@ union bpf_attr { * **BPF_F_TIMER_ABS** * Start the timer in absolute expire value instead of the * default relative one. + * **BPF_F_TIMER_CPU_PIN** + * Timer will be pinned to the CPU of the caller. * * Return * 0 on success. @@ -7348,9 +7350,11 @@ struct bpf_core_relo { * Flags to control bpf_timer_start() behaviour. * - BPF_F_TIMER_ABS: Timeout passed is absolute time, by default it is * relative to current time. + * - BPF_F_TIMER_CPU_PIN: Timer will be pinned to the CPU of the caller. */ enum { BPF_F_TIMER_ABS = (1ULL << 0), + BPF_F_TIMER_CPU_PIN = (1ULL << 1), }; /* BPF numbers iterator state */ diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index f40085f4de31..c889a9dee6c6 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1363,7 +1363,7 @@ BPF_CALL_3(bpf_timer_start, struct bpf_async_kern *, timer, u64, nsecs, u64, fla if (in_nmi()) return -EOPNOTSUPP; - if (flags > BPF_F_TIMER_ABS) + if (flags & ~(BPF_F_TIMER_ABS | BPF_F_TIMER_CPU_PIN)) return -EINVAL; __bpf_spin_lock_irqsave(&timer->lock); t = timer->timer; @@ -1377,6 +1377,9 @@ BPF_CALL_3(bpf_timer_start, struct bpf_async_kern *, timer, u64, nsecs, u64, fla else mode = HRTIMER_MODE_REL_SOFT; + if (flags & BPF_F_TIMER_CPU_PIN) + mode |= HRTIMER_MODE_PINNED; + hrtimer_start(&t->timer, ns_to_ktime(nsecs), mode); out: __bpf_spin_unlock_irqrestore(&timer->lock); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 337fc55cd500..7be085c034f7 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -5124,6 +5124,8 @@ union bpf_attr { * **BPF_F_TIMER_ABS** * Start the timer in absolute expire value instead of the * default relative one. + * **BPF_F_TIMER_CPU_PIN** + * Timer will be pinned to the CPU of the caller. * * Return * 0 on success. @@ -7351,9 +7353,11 @@ struct bpf_core_relo { * Flags to control bpf_timer_start() behaviour. * - BPF_F_TIMER_ABS: Timeout passed is absolute time, by default it is * relative to current time. + * - BPF_F_TIMER_CPU_PIN: Timer will be pinned to the CPU of the caller. */ enum { BPF_F_TIMER_ABS = (1ULL << 0), + BPF_F_TIMER_CPU_PIN = (1ULL << 1), }; /* BPF numbers iterator state */ -- 2.34.1
From: David Vernet <void@manifault.com> mainline inclusion from mainline-v6.7-rc1 commit 0d7ae06860753bb30b3731302b994da071120d00 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Now that we support pinning a BPF timer to the current core, we should test it with some selftests. This patch adds two new testcases to the timer suite, which verifies that a BPF timer both with and without BPF_F_TIMER_ABS, can be pinned to the calling core with BPF_F_TIMER_CPU_PIN. Signed-off-by: David Vernet <void@manifault.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <song@kernel.org> Acked-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/bpf/20231004162339.200702-3-void@manifault.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../testing/selftests/bpf/prog_tests/timer.c | 4 ++ tools/testing/selftests/bpf/progs/timer.c | 63 ++++++++++++++++++- 2 files changed, 66 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/prog_tests/timer.c b/tools/testing/selftests/bpf/prog_tests/timer.c index ce2c61d62fc6..760ad96b4be0 100644 --- a/tools/testing/selftests/bpf/prog_tests/timer.c +++ b/tools/testing/selftests/bpf/prog_tests/timer.c @@ -15,6 +15,7 @@ static int timer(struct timer *timer_skel) ASSERT_EQ(timer_skel->data->callback_check, 52, "callback_check1"); ASSERT_EQ(timer_skel->data->callback2_check, 52, "callback2_check1"); + ASSERT_EQ(timer_skel->bss->pinned_callback_check, 0, "pinned_callback_check1"); prog_fd = bpf_program__fd(timer_skel->progs.test1); err = bpf_prog_test_run_opts(prog_fd, &topts); @@ -33,6 +34,9 @@ static int timer(struct timer *timer_skel) /* check that timer_cb3() was executed twice */ ASSERT_EQ(timer_skel->bss->abs_data, 12, "abs_data"); + /* check that timer_cb_pinned() was executed twice */ + ASSERT_EQ(timer_skel->bss->pinned_callback_check, 2, "pinned_callback_check"); + /* check that there were no errors in timer execution */ ASSERT_EQ(timer_skel->bss->err, 0, "err"); diff --git a/tools/testing/selftests/bpf/progs/timer.c b/tools/testing/selftests/bpf/progs/timer.c index 9a16d95213e1..8b946c8188c6 100644 --- a/tools/testing/selftests/bpf/progs/timer.c +++ b/tools/testing/selftests/bpf/progs/timer.c @@ -51,7 +51,7 @@ struct { __uint(max_entries, 1); __type(key, int); __type(value, struct elem); -} abs_timer SEC(".maps"); +} abs_timer SEC(".maps"), soft_timer_pinned SEC(".maps"), abs_timer_pinned SEC(".maps"); __u64 bss_data; __u64 abs_data; @@ -59,6 +59,8 @@ __u64 err; __u64 ok; __u64 callback_check = 52; __u64 callback2_check = 52; +__u64 pinned_callback_check; +__s32 pinned_cpu; #define ARRAY 1 #define HTAB 2 @@ -329,3 +331,62 @@ int BPF_PROG2(test3, int, a) return 0; } + +/* callback for pinned timer */ +static int timer_cb_pinned(void *map, int *key, struct bpf_timer *timer) +{ + __s32 cpu = bpf_get_smp_processor_id(); + + if (cpu != pinned_cpu) + err |= 16384; + + pinned_callback_check++; + return 0; +} + +static void test_pinned_timer(bool soft) +{ + int key = 0; + void *map; + struct bpf_timer *timer; + __u64 flags = BPF_F_TIMER_CPU_PIN; + __u64 start_time; + + if (soft) { + map = &soft_timer_pinned; + start_time = 0; + } else { + map = &abs_timer_pinned; + start_time = bpf_ktime_get_boot_ns(); + flags |= BPF_F_TIMER_ABS; + } + + timer = bpf_map_lookup_elem(map, &key); + if (timer) { + if (bpf_timer_init(timer, map, CLOCK_BOOTTIME) != 0) + err |= 4096; + bpf_timer_set_callback(timer, timer_cb_pinned); + pinned_cpu = bpf_get_smp_processor_id(); + bpf_timer_start(timer, start_time + 1000, flags); + } else { + err |= 8192; + } +} + +SEC("fentry/bpf_fentry_test4") +int BPF_PROG2(test4, int, a) +{ + bpf_printk("test4"); + test_pinned_timer(true); + + return 0; +} + +SEC("fentry/bpf_fentry_test5") +int BPF_PROG2(test5, int, a) +{ + bpf_printk("test5"); + test_pinned_timer(false); + + return 0; +} -- 2.34.1
From: Dave Marchevsky <davemarchevsky@fb.com> mainline inclusion from mainline-v6.7-rc1 commit f10ca5da5bd71e5cefed7995e75a7c873ce3816e category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Commit 6018e1f407cc ("bpf: implement numbers iterator") added the BTF_TYPE_EMIT line that this patch is modifying. The struct btf_iter_num doesn't exist, so only a forward declaration is emitted in BTF: FWD 'btf_iter_num' fwd_kind=struct That commit was probably hoping to ensure that struct bpf_iter_num is emitted in vmlinux BTF. A previous version of this patch changed the line to emit the correct type, but Yonghong confirmed that it would definitely be emitted regardless in [0], so this patch simply removes the line. This isn't marked "Fixes" because the extraneous btf_iter_num FWD wasn't causing any issues that I noticed, aside from mild confusion when I looked through the code. [0]: https://lore.kernel.org/bpf/25d08207-43e6-36a8-5e0f-47a913d4cda5@linux.dev/ Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231013204426.1074286-2-davemarchevsky@fb.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/bpf_iter.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c index 96856f130cbf..833faa04461b 100644 --- a/kernel/bpf/bpf_iter.c +++ b/kernel/bpf/bpf_iter.c @@ -793,8 +793,6 @@ __bpf_kfunc int bpf_iter_num_new(struct bpf_iter_num *it, int start, int end) BUILD_BUG_ON(sizeof(struct bpf_iter_num_kern) != sizeof(struct bpf_iter_num)); BUILD_BUG_ON(__alignof__(struct bpf_iter_num_kern) != __alignof__(struct bpf_iter_num)); - BTF_TYPE_EMIT(struct btf_iter_num); - /* start == end is legit, it's an empty range and we'll just get NULL * on first (and any subsequent) bpf_iter_num_next() call */ -- 2.34.1
From: Dave Marchevsky <davemarchevsky@fb.com> mainline inclusion from mainline-v6.7-rc1 commit 45b38941c81f16bb2e9b0115f03e164a3576ea8b category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Further patches in this series will add a struct bpf_iter_task_vma, which will result in a name collision with the selftest prog renamed in this patch. Rename the selftest to avoid the collision. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231013204426.1074286-3-davemarchevsky@fb.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/prog_tests/bpf_iter.c | 26 +++++++++---------- ...f_iter_task_vma.c => bpf_iter_task_vmas.c} | 0 2 files changed, 13 insertions(+), 13 deletions(-) rename tools/testing/selftests/bpf/progs/{bpf_iter_task_vma.c => bpf_iter_task_vmas.c} (100%) diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c index f141e278b16f..caea3c500cc0 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c @@ -10,7 +10,7 @@ #include "bpf_iter_task.skel.h" #include "bpf_iter_task_stack.skel.h" #include "bpf_iter_task_file.skel.h" -#include "bpf_iter_task_vma.skel.h" +#include "bpf_iter_task_vmas.skel.h" #include "bpf_iter_task_btf.skel.h" #include "bpf_iter_tcp4.skel.h" #include "bpf_iter_tcp6.skel.h" @@ -1401,19 +1401,19 @@ static void str_strip_first_line(char *str) static void test_task_vma_common(struct bpf_iter_attach_opts *opts) { int err, iter_fd = -1, proc_maps_fd = -1; - struct bpf_iter_task_vma *skel; + struct bpf_iter_task_vmas *skel; int len, read_size = 4; char maps_path[64]; - skel = bpf_iter_task_vma__open(); - if (!ASSERT_OK_PTR(skel, "bpf_iter_task_vma__open")) + skel = bpf_iter_task_vmas__open(); + if (!ASSERT_OK_PTR(skel, "bpf_iter_task_vmas__open")) return; skel->bss->pid = getpid(); skel->bss->one_task = opts ? 1 : 0; - err = bpf_iter_task_vma__load(skel); - if (!ASSERT_OK(err, "bpf_iter_task_vma__load")) + err = bpf_iter_task_vmas__load(skel); + if (!ASSERT_OK(err, "bpf_iter_task_vmas__load")) goto out; skel->links.proc_maps = bpf_program__attach_iter( @@ -1464,25 +1464,25 @@ static void test_task_vma_common(struct bpf_iter_attach_opts *opts) out: close(proc_maps_fd); close(iter_fd); - bpf_iter_task_vma__destroy(skel); + bpf_iter_task_vmas__destroy(skel); } static void test_task_vma_dead_task(void) { - struct bpf_iter_task_vma *skel; + struct bpf_iter_task_vmas *skel; int wstatus, child_pid = -1; time_t start_tm, cur_tm; int err, iter_fd = -1; int wait_sec = 3; - skel = bpf_iter_task_vma__open(); - if (!ASSERT_OK_PTR(skel, "bpf_iter_task_vma__open")) + skel = bpf_iter_task_vmas__open(); + if (!ASSERT_OK_PTR(skel, "bpf_iter_task_vmas__open")) return; skel->bss->pid = getpid(); - err = bpf_iter_task_vma__load(skel); - if (!ASSERT_OK(err, "bpf_iter_task_vma__load")) + err = bpf_iter_task_vmas__load(skel); + if (!ASSERT_OK(err, "bpf_iter_task_vmas__load")) goto out; skel->links.proc_maps = bpf_program__attach_iter( @@ -1535,7 +1535,7 @@ static void test_task_vma_dead_task(void) out: waitpid(child_pid, &wstatus, 0); close(iter_fd); - bpf_iter_task_vma__destroy(skel); + bpf_iter_task_vmas__destroy(skel); } void test_bpf_sockmap_map_iter_fd(void) diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c b/tools/testing/selftests/bpf/progs/bpf_iter_task_vmas.c similarity index 100% rename from tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c rename to tools/testing/selftests/bpf/progs/bpf_iter_task_vmas.c -- 2.34.1
From: Dave Marchevsky <davemarchevsky@fb.com> mainline inclusion from mainline-v6.7-rc1 commit 4ac4546821584736798aaa9e97da9f6eaf689ea3 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- This patch adds kfuncs bpf_iter_task_vma_{new,next,destroy} which allow creation and manipulation of struct bpf_iter_task_vma in open-coded iterator style. BPF programs can use these kfuncs directly or through bpf_for_each macro for natural-looking iteration of all task vmas. The implementation borrows heavily from bpf_find_vma helper's locking - differing only in that it holds the mmap_read lock for all iterations while the helper only executes its provided callback on a maximum of 1 vma. Aside from locking, struct vma_iterator and vma_next do all the heavy lifting. A pointer to an inner data struct, struct bpf_iter_task_vma_data, is the only field in struct bpf_iter_task_vma. This is because the inner data struct contains a struct vma_iterator (not ptr), whose size is likely to change under us. If bpf_iter_task_vma_kern contained vma_iterator directly such a change would require change in opaque bpf_iter_task_vma struct's size. So better to allocate vma_iterator using BPF allocator, and since that alloc must already succeed, might as well allocate all iter fields, thereby freezing struct bpf_iter_task_vma size. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231013204426.1074286-4-davemarchevsky@fb.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/helpers.c | 3 ++ kernel/bpf/task_iter.c | 91 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 94 insertions(+) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index c889a9dee6c6..fed10499f218 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2671,6 +2671,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW) BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY) +BTF_ID_FLAGS(func, bpf_iter_task_vma_new, KF_ITER_NEW | KF_RCU) +BTF_ID_FLAGS(func, bpf_iter_task_vma_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_task_vma_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_dynptr_adjust) BTF_ID_FLAGS(func, bpf_dynptr_is_null) BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly) diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c index f7ef58090c7d..210128790e6f 100644 --- a/kernel/bpf/task_iter.c +++ b/kernel/bpf/task_iter.c @@ -7,7 +7,9 @@ #include <linux/fs.h> #include <linux/fdtable.h> #include <linux/filter.h> +#include <linux/bpf_mem_alloc.h> #include <linux/btf_ids.h> +#include <linux/mm_types.h> #include "mmap_unlock_work.h" static const char * const iter_task_type_names[] = { @@ -823,6 +825,95 @@ const struct bpf_func_proto bpf_find_vma_proto = { .arg5_type = ARG_ANYTHING, }; +struct bpf_iter_task_vma_kern_data { + struct task_struct *task; + struct mm_struct *mm; + struct mmap_unlock_irq_work *work; + struct vma_iterator vmi; +}; + +struct bpf_iter_task_vma { + /* opaque iterator state; having __u64 here allows to preserve correct + * alignment requirements in vmlinux.h, generated from BTF + */ + __u64 __opaque[1]; +} __attribute__((aligned(8))); + +/* Non-opaque version of bpf_iter_task_vma */ +struct bpf_iter_task_vma_kern { + struct bpf_iter_task_vma_kern_data *data; +} __attribute__((aligned(8))); + +__diag_push(); +__diag_ignore_all("-Wmissing-prototypes", + "Global functions as their definitions will be in vmlinux BTF"); + +__bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it, + struct task_struct *task, u64 addr) +{ + struct bpf_iter_task_vma_kern *kit = (void *)it; + bool irq_work_busy = false; + int err; + + BUILD_BUG_ON(sizeof(struct bpf_iter_task_vma_kern) != sizeof(struct bpf_iter_task_vma)); + BUILD_BUG_ON(__alignof__(struct bpf_iter_task_vma_kern) != __alignof__(struct bpf_iter_task_vma)); + + /* is_iter_reg_valid_uninit guarantees that kit hasn't been initialized + * before, so non-NULL kit->data doesn't point to previously + * bpf_mem_alloc'd bpf_iter_task_vma_kern_data + */ + kit->data = bpf_mem_alloc(&bpf_global_ma, sizeof(struct bpf_iter_task_vma_kern_data)); + if (!kit->data) + return -ENOMEM; + + kit->data->task = get_task_struct(task); + kit->data->mm = task->mm; + if (!kit->data->mm) { + err = -ENOENT; + goto err_cleanup_iter; + } + + /* kit->data->work == NULL is valid after bpf_mmap_unlock_get_irq_work */ + irq_work_busy = bpf_mmap_unlock_get_irq_work(&kit->data->work); + if (irq_work_busy || !mmap_read_trylock(kit->data->mm)) { + err = -EBUSY; + goto err_cleanup_iter; + } + + vma_iter_init(&kit->data->vmi, kit->data->mm, addr); + return 0; + +err_cleanup_iter: + if (kit->data->task) + put_task_struct(kit->data->task); + bpf_mem_free(&bpf_global_ma, kit->data); + /* NULL kit->data signals failed bpf_iter_task_vma initialization */ + kit->data = NULL; + return err; +} + +__bpf_kfunc struct vm_area_struct *bpf_iter_task_vma_next(struct bpf_iter_task_vma *it) +{ + struct bpf_iter_task_vma_kern *kit = (void *)it; + + if (!kit->data) /* bpf_iter_task_vma_new failed */ + return NULL; + return vma_next(&kit->data->vmi); +} + +__bpf_kfunc void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it) +{ + struct bpf_iter_task_vma_kern *kit = (void *)it; + + if (kit->data) { + bpf_mmap_unlock_mm(kit->data->work, kit->data->mm); + put_task_struct(kit->data->task); + bpf_mem_free(&bpf_global_ma, kit->data); + } +} + +__diag_pop(); + DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work); static void do_mmap_read_unlock(struct irq_work *entry) -- 2.34.1
From: Dave Marchevsky <davemarchevsky@fb.com> mainline inclusion from mainline-v6.7-rc1 commit e0e1a7a5fc377d54bd792c6368a375d41fc316ef category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- The open-coded task_vma iter added earlier in this series allows for natural iteration over a task's vmas using existing open-coded iter infrastructure, specifically bpf_for_each. This patch adds a test demonstrating this pattern and validating correctness. The vma->vm_start and vma->vm_end addresses of the first 1000 vmas are recorded and compared to /proc/PID/maps output. As expected, both see the same vmas and addresses - with the exception of the [vsyscall] vma - which is explained in a comment in the prog_tests program. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20231013204426.1074286-5-davemarchevsky@fb.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../testing/selftests/bpf/bpf_experimental.h | 8 +++ .../testing/selftests/bpf/prog_tests/iters.c | 58 +++++++++++++++++++ .../selftests/bpf/progs/iters_task_vma.c | 43 ++++++++++++++ 3 files changed, 109 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/iters_task_vma.c diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h index 4494eaa9937e..f7d32c0e1480 100644 --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -159,6 +159,14 @@ extern void *bpf_percpu_obj_new_impl(__u64 local_type_id, void *meta) __ksym; */ extern void bpf_percpu_obj_drop_impl(void *kptr, void *meta) __ksym; +struct bpf_iter_task_vma; + +extern int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it, + struct task_struct *task, + unsigned long addr) __ksym; +extern struct vm_area_struct *bpf_iter_task_vma_next(struct bpf_iter_task_vma *it) __ksym; +extern void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it) __ksym; + /* Convenience macro to wrap over bpf_obj_drop_impl */ #define bpf_percpu_obj_drop(kptr) bpf_percpu_obj_drop_impl(kptr, NULL) diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c index 10804ae5ae97..b696873c5455 100644 --- a/tools/testing/selftests/bpf/prog_tests/iters.c +++ b/tools/testing/selftests/bpf/prog_tests/iters.c @@ -8,6 +8,7 @@ #include "iters_looping.skel.h" #include "iters_num.skel.h" #include "iters_testmod_seq.skel.h" +#include "iters_task_vma.skel.h" static void subtest_num_iters(void) { @@ -90,6 +91,61 @@ static void subtest_testmod_seq_iters(void) iters_testmod_seq__destroy(skel); } +static void subtest_task_vma_iters(void) +{ + unsigned long start, end, bpf_iter_start, bpf_iter_end; + struct iters_task_vma *skel; + char rest_of_line[1000]; + unsigned int seen; + FILE *f = NULL; + int err; + + skel = iters_task_vma__open_and_load(); + if (!ASSERT_OK_PTR(skel, "skel_open_and_load")) + return; + + skel->bss->target_pid = getpid(); + + err = iters_task_vma__attach(skel); + if (!ASSERT_OK(err, "skel_attach")) + goto cleanup; + + getpgid(skel->bss->target_pid); + iters_task_vma__detach(skel); + + if (!ASSERT_GT(skel->bss->vmas_seen, 0, "vmas_seen_gt_zero")) + goto cleanup; + + f = fopen("/proc/self/maps", "r"); + if (!ASSERT_OK_PTR(f, "proc_maps_fopen")) + goto cleanup; + + seen = 0; + while (fscanf(f, "%lx-%lx %[^\n]\n", &start, &end, rest_of_line) == 3) { + /* [vsyscall] vma isn't _really_ part of task->mm vmas. + * /proc/PID/maps returns it when out of vmas - see get_gate_vma + * calls in fs/proc/task_mmu.c + */ + if (strstr(rest_of_line, "[vsyscall]")) + continue; + + bpf_iter_start = skel->bss->vm_ranges[seen].vm_start; + bpf_iter_end = skel->bss->vm_ranges[seen].vm_end; + + ASSERT_EQ(bpf_iter_start, start, "vma->vm_start match"); + ASSERT_EQ(bpf_iter_end, end, "vma->vm_end match"); + seen++; + } + + if (!ASSERT_EQ(skel->bss->vmas_seen, seen, "vmas_seen_eq")) + goto cleanup; + +cleanup: + if (f) + fclose(f); + iters_task_vma__destroy(skel); +} + void test_iters(void) { RUN_TESTS(iters_state_safety); @@ -103,4 +159,6 @@ void test_iters(void) subtest_num_iters(); if (test__start_subtest("testmod_seq")) subtest_testmod_seq_iters(); + if (test__start_subtest("task_vma")) + subtest_task_vma_iters(); } diff --git a/tools/testing/selftests/bpf/progs/iters_task_vma.c b/tools/testing/selftests/bpf/progs/iters_task_vma.c new file mode 100644 index 000000000000..44edecfdfaee --- /dev/null +++ b/tools/testing/selftests/bpf/progs/iters_task_vma.c @@ -0,0 +1,43 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */ + +#include "vmlinux.h" +#include "bpf_experimental.h" +#include <bpf/bpf_helpers.h> +#include "bpf_misc.h" + +pid_t target_pid = 0; +unsigned int vmas_seen = 0; + +struct { + __u64 vm_start; + __u64 vm_end; +} vm_ranges[1000]; + +SEC("raw_tp/sys_enter") +int iter_task_vma_for_each(const void *ctx) +{ + struct task_struct *task = bpf_get_current_task_btf(); + struct vm_area_struct *vma; + unsigned int seen = 0; + + if (task->pid != target_pid) + return 0; + + if (vmas_seen) + return 0; + + bpf_for_each(task_vma, vma, task, 0) { + if (seen >= 1000) + break; + + vm_ranges[seen].vm_start = vma->vm_start; + vm_ranges[seen].vm_end = vma->vm_end; + seen++; + } + + vmas_seen = seen; + return 0; +} + +char _license[] SEC("license") = "GPL"; -- 2.34.1
From: Chuyi Zhou <zhouchuyi@bytedance.com> mainline inclusion from mainline-v6.7-rc1 commit 6da88306811b40a207c94c9da9faf07bdb20776e category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- This patch makes some preparations for using css_task_iter_*() in BPF Program. 1. Flags CSS_TASK_ITER_* are #define-s and it's not easy for bpf prog to use them. Convert them to enum so bpf prog can take them from vmlinux.h. 2. In the next patch we will add css_task_iter_*() in common kfuncs which is not safe. Since css_task_iter_*() does spin_unlock_irq() which might screw up irq flags depending on the context where bpf prog is running. So we should use irqsave/irqrestore here and the switching is harmless. Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20231018061746.111364-2-zhouchuyi@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/cgroup.h | 12 +++++------- kernel/cgroup/cgroup.c | 18 ++++++++++++------ 2 files changed, 17 insertions(+), 13 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index e0be4e60aba8..8148e2f65ab7 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -41,13 +41,11 @@ struct kernel_clone_args; #define CGROUP_WEIGHT_DFL 100 #define CGROUP_WEIGHT_MAX 10000 -/* walk only threadgroup leaders */ -#define CSS_TASK_ITER_PROCS (1U << 0) -/* walk all threaded css_sets in the domain */ -#define CSS_TASK_ITER_THREADED (1U << 1) - -/* internal flags */ -#define CSS_TASK_ITER_SKIPPED (1U << 16) +enum { + CSS_TASK_ITER_PROCS = (1U << 0), /* walk only threadgroup leaders */ + CSS_TASK_ITER_THREADED = (1U << 1), /* walk all threaded css_sets in the domain */ + CSS_TASK_ITER_SKIPPED = (1U << 16), /* internal flags */ +}; /* a css_task_iter should be treated as an opaque object */ struct css_task_iter { diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 17521bc192ee..d0a298cdf532 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -5079,9 +5079,11 @@ static void css_task_iter_advance(struct css_task_iter *it) void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags, struct css_task_iter *it) { + unsigned long irqflags; + memset(it, 0, sizeof(*it)); - spin_lock_irq(&css_set_lock); + spin_lock_irqsave(&css_set_lock, irqflags); it->ss = css->ss; it->flags = flags; @@ -5095,7 +5097,7 @@ void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags, css_task_iter_advance(it); - spin_unlock_irq(&css_set_lock); + spin_unlock_irqrestore(&css_set_lock, irqflags); } /** @@ -5108,12 +5110,14 @@ void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags, */ struct task_struct *css_task_iter_next(struct css_task_iter *it) { + unsigned long irqflags; + if (it->cur_task) { put_task_struct(it->cur_task); it->cur_task = NULL; } - spin_lock_irq(&css_set_lock); + spin_lock_irqsave(&css_set_lock, irqflags); /* @it may be half-advanced by skips, finish advancing */ if (it->flags & CSS_TASK_ITER_SKIPPED) @@ -5126,7 +5130,7 @@ struct task_struct *css_task_iter_next(struct css_task_iter *it) css_task_iter_advance(it); } - spin_unlock_irq(&css_set_lock); + spin_unlock_irqrestore(&css_set_lock, irqflags); return it->cur_task; } @@ -5139,11 +5143,13 @@ struct task_struct *css_task_iter_next(struct css_task_iter *it) */ void css_task_iter_end(struct css_task_iter *it) { + unsigned long irqflags; + if (it->cur_cset) { - spin_lock_irq(&css_set_lock); + spin_lock_irqsave(&css_set_lock, irqflags); list_del(&it->iters_node); put_css_set_locked(it->cur_cset); - spin_unlock_irq(&css_set_lock); + spin_unlock_irqrestore(&css_set_lock, irqflags); } if (it->cur_dcset) -- 2.34.1
From: Chuyi Zhou <zhouchuyi@bytedance.com> mainline inclusion from mainline-v6.7-rc1 commit 9c66dc94b62aef23300f05f63404afb8990920b4 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- This patch adds kfuncs bpf_iter_css_task_{new,next,destroy} which allow creation and manipulation of struct bpf_iter_css_task in open-coded iterator style. These kfuncs actually wrapps css_task_iter_{start,next, end}. BPF programs can use these kfuncs through bpf_for_each macro for iteration of all tasks under a css. css_task_iter_*() would try to get the global spin-lock *css_set_lock*, so the bpf side has to be careful in where it allows to use this iter. Currently we only allow it in bpf_lsm and bpf iter-s. Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20231018061746.111364-3-zhouchuyi@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Conflicts: kernel/bpf/verifier.c tools/testing/selftests/bpf/bpf_experimental.h [Fix conflicts because of missing some kfuncs and hisock.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/helpers.c | 3 + kernel/bpf/task_iter.c | 58 +++++++++++++++++++ kernel/bpf/verifier.c | 23 ++++++++ .../testing/selftests/bpf/bpf_experimental.h | 8 +++ 4 files changed, 92 insertions(+) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index fed10499f218..a698e70c0250 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2674,6 +2674,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_iter_task_vma_new, KF_ITER_NEW | KF_RCU) BTF_ID_FLAGS(func, bpf_iter_task_vma_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_task_vma_destroy, KF_ITER_DESTROY) +BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_dynptr_adjust) BTF_ID_FLAGS(func, bpf_dynptr_is_null) BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly) diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c index 210128790e6f..0a4e497dd6cc 100644 --- a/kernel/bpf/task_iter.c +++ b/kernel/bpf/task_iter.c @@ -914,6 +914,64 @@ __bpf_kfunc void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it) __diag_pop(); +struct bpf_iter_css_task { + __u64 __opaque[1]; +} __attribute__((aligned(8))); + +struct bpf_iter_css_task_kern { + struct css_task_iter *css_it; +} __attribute__((aligned(8))); + +__diag_push(); +__diag_ignore_all("-Wmissing-prototypes", + "Global functions as their definitions will be in vmlinux BTF"); + +__bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it, + struct cgroup_subsys_state *css, unsigned int flags) +{ + struct bpf_iter_css_task_kern *kit = (void *)it; + + BUILD_BUG_ON(sizeof(struct bpf_iter_css_task_kern) != sizeof(struct bpf_iter_css_task)); + BUILD_BUG_ON(__alignof__(struct bpf_iter_css_task_kern) != + __alignof__(struct bpf_iter_css_task)); + kit->css_it = NULL; + switch (flags) { + case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED: + case CSS_TASK_ITER_PROCS: + case 0: + break; + default: + return -EINVAL; + } + + kit->css_it = bpf_mem_alloc(&bpf_global_ma, sizeof(struct css_task_iter)); + if (!kit->css_it) + return -ENOMEM; + css_task_iter_start(css, flags, kit->css_it); + return 0; +} + +__bpf_kfunc struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) +{ + struct bpf_iter_css_task_kern *kit = (void *)it; + + if (!kit->css_it) + return NULL; + return css_task_iter_next(kit->css_it); +} + +__bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) +{ + struct bpf_iter_css_task_kern *kit = (void *)it; + + if (!kit->css_it) + return; + css_task_iter_end(kit->css_it); + bpf_mem_free(&bpf_global_ma, kit->css_it); +} + +__diag_pop(); + DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work); static void do_mmap_read_unlock(struct irq_work *entry) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 9ff0704cef3d..c5ed5ed56532 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -10815,6 +10815,7 @@ enum special_kfunc_type { KF_bpf_dynptr_slice, KF_bpf_dynptr_slice_rdwr, KF_bpf_dynptr_clone, + KF_bpf_iter_css_task_new, #ifdef CONFIG_HISOCK KF_bpf_set_ingress_dst, KF_bpf_set_ingress_dev, @@ -10843,6 +10844,7 @@ BTF_ID(func, bpf_dynptr_from_xdp) BTF_ID(func, bpf_dynptr_slice) BTF_ID(func, bpf_dynptr_slice_rdwr) BTF_ID(func, bpf_dynptr_clone) +BTF_ID(func, bpf_iter_css_task_new) #ifdef CONFIG_HISOCK BTF_ID(func, bpf_set_ingress_dst) BTF_ID(func, bpf_set_ingress_dev) @@ -10873,6 +10875,7 @@ BTF_ID(func, bpf_dynptr_from_xdp) BTF_ID(func, bpf_dynptr_slice) BTF_ID(func, bpf_dynptr_slice_rdwr) BTF_ID(func, bpf_dynptr_clone) +BTF_ID(func, bpf_iter_css_task_new) #ifdef CONFIG_HISOCK BTF_ID(func, bpf_set_ingress_dst) BTF_ID(func, bpf_set_ingress_dev) @@ -11405,6 +11408,20 @@ static int process_kf_arg_ptr_to_rbtree_node(struct bpf_verifier_env *env, &meta->arg_rbtree_root.field); } +static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env) +{ + enum bpf_prog_type prog_type = resolve_prog_type(env->prog); + + switch (prog_type) { + case BPF_PROG_TYPE_LSM: + return true; + case BPF_TRACE_ITER: + return env->prog->aux->sleepable; + default: + return false; + } +} + static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta, int insn_idx) { @@ -11646,6 +11663,12 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ break; } case KF_ARG_PTR_TO_ITER: + if (meta->func_id == special_kfunc_list[KF_bpf_iter_css_task_new]) { + if (!check_css_task_iter_allowlist(env)) { + verbose(env, "css_task_iter is only allowed in bpf_lsm and bpf iter-s\n"); + return -EINVAL; + } + } ret = process_iter_arg(env, regno, insn_idx, meta); if (ret < 0) return ret; diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h index f7d32c0e1480..1d9dc535062f 100644 --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -170,4 +170,12 @@ extern void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it) __ksym; /* Convenience macro to wrap over bpf_obj_drop_impl */ #define bpf_percpu_obj_drop(kptr) bpf_percpu_obj_drop_impl(kptr, NULL) +struct bpf_iter_css_task; +struct cgroup_subsys_state; +extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it, + struct cgroup_subsys_state *css, unsigned int flags) __weak __ksym; +extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym; +extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym; + + #endif -- 2.34.1
From: Chuyi Zhou <zhouchuyi@bytedance.com> mainline inclusion from mainline-v6.7-rc1 commit c68a78ffe2cb4207f64fd0f4262818c728c67be0 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow creation and manipulation of struct bpf_iter_task in open-coded iterator style. BPF programs can use these kfuncs or through bpf_for_each macro to iterate all processes in the system. The API design keep consistent with SEC("iter/task"). bpf_iter_task_new() accepts a specific task and iterating type which allows: 1. iterating all process in the system (BPF_TASK_ITER_ALL_PROCS) 2. iterating all threads in the system (BPF_TASK_ITER_ALL_THREADS) 3. iterating all threads of a specific task (BPF_TASK_ITER_PROC_THREADS) Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> Link: https://lore.kernel.org/r/20231018061746.111364-4-zhouchuyi@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/helpers.c | 3 + kernel/bpf/task_iter.c | 90 +++++++++++++++++++ .../testing/selftests/bpf/bpf_experimental.h | 5 ++ 3 files changed, 98 insertions(+) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index a698e70c0250..e6c44a165a85 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2677,6 +2677,9 @@ BTF_ID_FLAGS(func, bpf_iter_task_vma_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY) +BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_dynptr_adjust) BTF_ID_FLAGS(func, bpf_dynptr_is_null) BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly) diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c index 0a4e497dd6cc..5e9a290f868b 100644 --- a/kernel/bpf/task_iter.c +++ b/kernel/bpf/task_iter.c @@ -972,6 +972,96 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __diag_pop(); +struct bpf_iter_task { + __u64 __opaque[3]; +} __attribute__((aligned(8))); + +struct bpf_iter_task_kern { + struct task_struct *task; + struct task_struct *pos; + unsigned int flags; +} __attribute__((aligned(8))); + +enum { + /* all process in the system */ + BPF_TASK_ITER_ALL_PROCS, + /* all threads in the system */ + BPF_TASK_ITER_ALL_THREADS, + /* all threads of a specific process */ + BPF_TASK_ITER_PROC_THREADS +}; + +__diag_push(); +__diag_ignore_all("-Wmissing-prototypes", + "Global functions as their definitions will be in vmlinux BTF"); + +__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, + struct task_struct *task, unsigned int flags) +{ + struct bpf_iter_task_kern *kit = (void *)it; + + BUILD_BUG_ON(sizeof(struct bpf_iter_task_kern) > sizeof(struct bpf_iter_task)); + BUILD_BUG_ON(__alignof__(struct bpf_iter_task_kern) != + __alignof__(struct bpf_iter_task)); + + kit->task = kit->pos = NULL; + switch (flags) { + case BPF_TASK_ITER_ALL_THREADS: + case BPF_TASK_ITER_ALL_PROCS: + case BPF_TASK_ITER_PROC_THREADS: + break; + default: + return -EINVAL; + } + + if (flags == BPF_TASK_ITER_PROC_THREADS) + kit->task = task; + else + kit->task = &init_task; + kit->pos = kit->task; + kit->flags = flags; + return 0; +} + +__bpf_kfunc struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it) +{ + struct bpf_iter_task_kern *kit = (void *)it; + struct task_struct *pos; + unsigned int flags; + + flags = kit->flags; + pos = kit->pos; + + if (!pos) + return pos; + + if (flags == BPF_TASK_ITER_ALL_PROCS) + goto get_next_task; + + kit->pos = next_thread(kit->pos); + if (kit->pos == kit->task) { + if (flags == BPF_TASK_ITER_PROC_THREADS) { + kit->pos = NULL; + return pos; + } + } else + return pos; + +get_next_task: + kit->pos = next_task(kit->pos); + kit->task = kit->pos; + if (kit->pos == &init_task) + kit->pos = NULL; + + return pos; +} + +__bpf_kfunc void bpf_iter_task_destroy(struct bpf_iter_task *it) +{ +} + +__diag_pop(); + DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work); static void do_mmap_read_unlock(struct irq_work *entry) diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h index 1d9dc535062f..2a67e2819f01 100644 --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -177,5 +177,10 @@ extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it, extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym; extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym; +struct bpf_iter_task; +extern int bpf_iter_task_new(struct bpf_iter_task *it, + struct task_struct *task, unsigned int flags) __weak __ksym; +extern struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it) __weak __ksym; +extern void bpf_iter_task_destroy(struct bpf_iter_task *it) __weak __ksym; #endif -- 2.34.1
From: Chuyi Zhou <zhouchuyi@bytedance.com> mainline inclusion from mainline-v6.7-rc1 commit 7251d0905e7518bcb990c8e9a3615b1bb23c78f2 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- This Patch adds kfuncs bpf_iter_css_{new,next,destroy} which allow creation and manipulation of struct bpf_iter_css in open-coded iterator style. These kfuncs actually wrapps css_next_descendant_{pre, post}. css_iter can be used to: 1) iterating a sepcific cgroup tree with pre/post/up order 2) iterating cgroup_subsystem in BPF Prog, like for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre in kernel. The API design is consistent with cgroup_iter. bpf_iter_css_new accepts parameters defining iteration order and starting css. Here we also reuse BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST, BPF_CGROUP_ITER_ANCESTORS_UP enums. Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20231018061746.111364-5-zhouchuyi@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/cgroup_iter.c | 65 +++++++++++++++++++ kernel/bpf/helpers.c | 3 + .../testing/selftests/bpf/bpf_experimental.h | 6 ++ 3 files changed, 74 insertions(+) diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c index 810378f04fbc..209e5135f9fb 100644 --- a/kernel/bpf/cgroup_iter.c +++ b/kernel/bpf/cgroup_iter.c @@ -294,3 +294,68 @@ static int __init bpf_cgroup_iter_init(void) } late_initcall(bpf_cgroup_iter_init); + +struct bpf_iter_css { + __u64 __opaque[3]; +} __attribute__((aligned(8))); + +struct bpf_iter_css_kern { + struct cgroup_subsys_state *start; + struct cgroup_subsys_state *pos; + unsigned int flags; +} __attribute__((aligned(8))); + +__diag_push(); +__diag_ignore_all("-Wmissing-prototypes", + "Global functions as their definitions will be in vmlinux BTF"); + +__bpf_kfunc int bpf_iter_css_new(struct bpf_iter_css *it, + struct cgroup_subsys_state *start, unsigned int flags) +{ + struct bpf_iter_css_kern *kit = (void *)it; + + BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) > sizeof(struct bpf_iter_css)); + BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css)); + + kit->start = NULL; + switch (flags) { + case BPF_CGROUP_ITER_DESCENDANTS_PRE: + case BPF_CGROUP_ITER_DESCENDANTS_POST: + case BPF_CGROUP_ITER_ANCESTORS_UP: + break; + default: + return -EINVAL; + } + + kit->start = start; + kit->pos = NULL; + kit->flags = flags; + return 0; +} + +__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) +{ + struct bpf_iter_css_kern *kit = (void *)it; + + if (!kit->start) + return NULL; + + switch (kit->flags) { + case BPF_CGROUP_ITER_DESCENDANTS_PRE: + kit->pos = css_next_descendant_pre(kit->pos, kit->start); + break; + case BPF_CGROUP_ITER_DESCENDANTS_POST: + kit->pos = css_next_descendant_post(kit->pos, kit->start); + break; + case BPF_CGROUP_ITER_ANCESTORS_UP: + kit->pos = kit->pos ? kit->pos->parent : kit->start; + } + + return kit->pos; +} + +__bpf_kfunc void bpf_iter_css_destroy(struct bpf_iter_css *it) +{ +} + +__diag_pop(); \ No newline at end of file diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index e6c44a165a85..a9f9ac4d04ca 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2680,6 +2680,9 @@ BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY) +BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_css_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_css_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_dynptr_adjust) BTF_ID_FLAGS(func, bpf_dynptr_is_null) BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly) diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h index 2a67e2819f01..facf41fec72f 100644 --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -183,4 +183,10 @@ extern int bpf_iter_task_new(struct bpf_iter_task *it, extern struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it) __weak __ksym; extern void bpf_iter_task_destroy(struct bpf_iter_task *it) __weak __ksym; +struct bpf_iter_css; +extern int bpf_iter_css_new(struct bpf_iter_css *it, + struct cgroup_subsys_state *start, unsigned int flags) __weak __ksym; +extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym; +extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym; + #endif -- 2.34.1
From: Chuyi Zhou <zhouchuyi@bytedance.com> mainline inclusion from mainline-v6.7-rc1 commit dfab99df147b0d364f0c199f832ff2aedfb2265a category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- css_iter and task_iter should be used in rcu section. Specifically, in sleepable progs explicit bpf_rcu_read_lock() is needed before use these iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to use them directly. This patch adds a new a KF flag KF_RCU_PROTECTED for bpf_iter_task_new and bpf_iter_css_new. It means the kfunc should be used in RCU CS. We check whether we are in rcu cs before we want to invoke this kfunc. If the rcu protection is guaranteed, we would let st->type = PTR_TO_STACK | MEM_RCU. Once user do rcu_unlock during the iteration, state MEM_RCU of regs would be cleared. is_iter_reg_valid_init() will reject if reg->type is UNTRUSTED. It is worth noting that currently, bpf_rcu_read_unlock does not clear the state of the STACK_ITER reg, since bpf_for_each_spilled_reg only considers STACK_SPILL. This patch also let bpf_for_each_spilled_reg search STACK_ITER. Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231018061746.111364-6-zhouchuyi@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Conflicts: include/linux/bpf_verifier.h [Fix conflicts because of kabi.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf_verifier.h | 19 ++++++++------ include/linux/btf.h | 1 + kernel/bpf/helpers.c | 4 +-- kernel/bpf/verifier.c | 50 ++++++++++++++++++++++++++++-------- 4 files changed, 53 insertions(+), 21 deletions(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index b7ff22ed0fd7..e1d213f2585d 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -473,19 +473,18 @@ struct bpf_verifier_state { KABI_RESERVE(4) }; -#define bpf_get_spilled_reg(slot, frame) \ +#define bpf_get_spilled_reg(slot, frame, mask) \ (((slot < frame->allocated_stack / BPF_REG_SIZE) && \ - (frame->stack[slot].slot_type[0] == STACK_SPILL)) \ + ((1 << frame->stack[slot].slot_type[0]) & (mask))) \ ? &frame->stack[slot].spilled_ptr : NULL) /* Iterate over 'frame', setting 'reg' to either NULL or a spilled register. */ -#define bpf_for_each_spilled_reg(iter, frame, reg) \ - for (iter = 0, reg = bpf_get_spilled_reg(iter, frame); \ +#define bpf_for_each_spilled_reg(iter, frame, reg, mask) \ + for (iter = 0, reg = bpf_get_spilled_reg(iter, frame, mask); \ iter < frame->allocated_stack / BPF_REG_SIZE; \ - iter++, reg = bpf_get_spilled_reg(iter, frame)) + iter++, reg = bpf_get_spilled_reg(iter, frame, mask)) -/* Invoke __expr over regsiters in __vst, setting __state and __reg */ -#define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr) \ +#define bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, __mask, __expr) \ ({ \ struct bpf_verifier_state *___vstate = __vst; \ int ___i, ___j; \ @@ -497,7 +496,7 @@ struct bpf_verifier_state { __reg = &___regs[___j]; \ (void)(__expr); \ } \ - bpf_for_each_spilled_reg(___j, __state, __reg) { \ + bpf_for_each_spilled_reg(___j, __state, __reg, __mask) { \ if (!__reg) \ continue; \ (void)(__expr); \ @@ -505,6 +504,10 @@ struct bpf_verifier_state { } \ }) +/* Invoke __expr over regsiters in __vst, setting __state and __reg */ +#define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr) \ + bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, 1 << STACK_SPILL, __expr) + /* linked list of verifier states used to prune search */ struct bpf_verifier_state_list { struct bpf_verifier_state state; diff --git a/include/linux/btf.h b/include/linux/btf.h index 928113a80a95..c2231c64d60b 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -74,6 +74,7 @@ #define KF_ITER_NEW (1 << 8) /* kfunc implements BPF iter constructor */ #define KF_ITER_NEXT (1 << 9) /* kfunc implements BPF iter next method */ #define KF_ITER_DESTROY (1 << 10) /* kfunc implements BPF iter destructor */ +#define KF_RCU_PROTECTED (1 << 11) /* kfunc should be protected by rcu cs when they are invoked */ /* * Tag marking a kernel function as a kfunc. This is meant to minimize the diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index a9f9ac4d04ca..112588659104 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2677,10 +2677,10 @@ BTF_ID_FLAGS(func, bpf_iter_task_vma_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY) -BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED) BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY) -BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED) BTF_ID_FLAGS(func, bpf_iter_css_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_css_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_dynptr_adjust) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index c5ed5ed56532..34eb18445188 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1191,7 +1191,12 @@ static bool is_dynptr_type_expected(struct bpf_verifier_env *env, struct bpf_reg static void __mark_reg_known_zero(struct bpf_reg_state *reg); +static bool in_rcu_cs(struct bpf_verifier_env *env); + +static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta); + static int mark_stack_slots_iter(struct bpf_verifier_env *env, + struct bpf_kfunc_call_arg_meta *meta, struct bpf_reg_state *reg, int insn_idx, struct btf *btf, u32 btf_id, int nr_slots) { @@ -1212,6 +1217,12 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env, __mark_reg_known_zero(st); st->type = PTR_TO_STACK; /* we don't have dedicated reg type */ + if (is_kfunc_rcu_protected(meta)) { + if (in_rcu_cs(env)) + st->type |= MEM_RCU; + else + st->type |= PTR_UNTRUSTED; + } st->live |= REG_LIVE_WRITTEN; st->ref_obj_id = i == 0 ? id : 0; st->iter.btf = btf; @@ -1286,7 +1297,7 @@ static bool is_iter_reg_valid_uninit(struct bpf_verifier_env *env, return true; } -static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg, +static int is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg, struct btf *btf, u32 btf_id, int nr_slots) { struct bpf_func_state *state = func(env, reg); @@ -1294,26 +1305,28 @@ static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_ spi = iter_get_spi(env, reg, nr_slots); if (spi < 0) - return false; + return -EINVAL; for (i = 0; i < nr_slots; i++) { struct bpf_stack_state *slot = &state->stack[spi - i]; struct bpf_reg_state *st = &slot->spilled_ptr; + if (st->type & PTR_UNTRUSTED) + return -EPROTO; /* only main (first) slot has ref_obj_id set */ if (i == 0 && !st->ref_obj_id) - return false; + return -EINVAL; if (i != 0 && st->ref_obj_id) - return false; + return -EINVAL; if (st->iter.btf != btf || st->iter.btf_id != btf_id) - return false; + return -EINVAL; for (j = 0; j < BPF_REG_SIZE; j++) if (slot->slot_type[j] != STACK_ITER) - return false; + return -EINVAL; } - return true; + return 0; } /* Check if given stack slot is "special": @@ -7841,15 +7854,24 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id return err; } - err = mark_stack_slots_iter(env, reg, insn_idx, meta->btf, btf_id, nr_slots); + err = mark_stack_slots_iter(env, meta, reg, insn_idx, meta->btf, btf_id, nr_slots); if (err) return err; } else { /* iter_next() or iter_destroy() expect initialized iter state*/ - if (!is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots)) { + err = is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots); + switch (err) { + case 0: + break; + case -EINVAL: verbose(env, "expected an initialized iter_%s as arg #%d\n", iter_type_str(meta->btf, btf_id), regno); - return -EINVAL; + return err; + case -EPROTO: + verbose(env, "expected an RCU CS when using %s\n", meta->func_name); + return err; + default: + return err; } spi = iter_get_spi(env, reg, nr_slots); @@ -10577,6 +10599,11 @@ static bool is_kfunc_rcu(struct bpf_kfunc_call_arg_meta *meta) return meta->kfunc_flags & KF_RCU; } +static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta) +{ + return meta->kfunc_flags & KF_RCU_PROTECTED; +} + static bool __kfunc_param_match_suffix(const struct btf *btf, const struct btf_param *arg, const char *suffix) @@ -11964,6 +11991,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, if (env->cur_state->active_rcu_lock) { struct bpf_func_state *state; struct bpf_reg_state *reg; + u32 clear_mask = (1 << STACK_SPILL) | (1 << STACK_ITER); if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) { verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n"); @@ -11974,7 +12002,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, verbose(env, "nested rcu read lock (kernel function %s)\n", func_name); return -EINVAL; } else if (rcu_unlock) { - bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ + bpf_for_each_reg_in_vstate_mask(env->cur_state, state, reg, clear_mask, ({ if (reg->type & MEM_RCU) { reg->type &= ~(MEM_RCU | PTR_MAYBE_NULL); reg->type |= PTR_UNTRUSTED; -- 2.34.1
From: Chuyi Zhou <zhouchuyi@bytedance.com> mainline inclusion from mainline-v6.7-rc1 commit cb3ecf7915a1d7ce5304402f4d8616d9fa5193f7 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- When using task_iter to iterate all threads of a specific task, we enforce that the user must pass a valid task pointer to ensure safety. However, when iterating all threads/process in the system, BPF verifier still require a valid ptr instead of "nullable" pointer, even though it's pointless, which is a kind of surprising from usability standpoint. It would be nice if we could let that kfunc accept a explicit null pointer when we are using BPF_TASK_ITER_ALL_{PROCS, THREADS} and a valid pointer when using BPF_TASK_ITER_THREAD. Given a trival kfunc: __bpf_kfunc void FN(struct TYPE_A *obj); BPF Prog would reject a nullptr for obj. The error info is: "arg#x pointer type xx xx must point to scalar, or struct with scalar" reported by get_kfunc_ptr_arg_type(). The reg->type is SCALAR_VALUE and the btf type of ref_t is not scalar or scalar_struct which leads to the rejection of get_kfunc_ptr_arg_type. This patch add "__nullable" annotation: __bpf_kfunc void FN(struct TYPE_A *obj__nullable); Here __nullable indicates obj can be optional, user can pass a explicit nullptr or a normal TYPE_A pointer. In get_kfunc_ptr_arg_type(), we will detect whether the current arg is optional and register is null, If so, return a new kfunc_ptr_arg_type KF_ARG_PTR_TO_NULL and skip to the next arg in check_kfunc_args(). Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231018061746.111364-7-zhouchuyi@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/task_iter.c | 7 +++++-- kernel/bpf/verifier.c | 13 ++++++++++++- 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c index 5e9a290f868b..a663c3d11b62 100644 --- a/kernel/bpf/task_iter.c +++ b/kernel/bpf/task_iter.c @@ -996,7 +996,7 @@ __diag_ignore_all("-Wmissing-prototypes", "Global functions as their definitions will be in vmlinux BTF"); __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, - struct task_struct *task, unsigned int flags) + struct task_struct *task__nullable, unsigned int flags) { struct bpf_iter_task_kern *kit = (void *)it; @@ -1008,14 +1008,17 @@ __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, switch (flags) { case BPF_TASK_ITER_ALL_THREADS: case BPF_TASK_ITER_ALL_PROCS: + break; case BPF_TASK_ITER_PROC_THREADS: + if (!task__nullable) + return -EINVAL; break; default: return -EINVAL; } if (flags == BPF_TASK_ITER_PROC_THREADS) - kit->task = task; + kit->task = task__nullable; else kit->task = &init_task; kit->pos = kit->task; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 34eb18445188..b0e078f2d9c4 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -10678,6 +10678,11 @@ static bool is_kfunc_arg_refcounted_kptr(const struct btf *btf, const struct btf return __kfunc_param_match_suffix(btf, arg, "__refcounted_kptr"); } +static bool is_kfunc_arg_nullable(const struct btf *btf, const struct btf_param *arg) +{ + return __kfunc_param_match_suffix(btf, arg, "__nullable"); +} + static bool is_kfunc_arg_scalar_with_name(const struct btf *btf, const struct btf_param *arg, const char *name) @@ -10820,6 +10825,7 @@ enum kfunc_ptr_arg_type { KF_ARG_PTR_TO_CALLBACK, KF_ARG_PTR_TO_RB_ROOT, KF_ARG_PTR_TO_RB_NODE, + KF_ARG_PTR_TO_NULL, }; enum special_kfunc_type { @@ -10991,6 +10997,8 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env, if (is_kfunc_arg_callback(env, meta->btf, &args[argno])) return KF_ARG_PTR_TO_CALLBACK; + if (is_kfunc_arg_nullable(meta->btf, &args[argno]) && register_is_null(reg)) + return KF_ARG_PTR_TO_NULL; if (argno + 1 < nargs && (is_kfunc_arg_mem_size(meta->btf, &args[argno + 1], ®s[regno + 1]) || @@ -11535,7 +11543,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ } if ((is_kfunc_trusted_args(meta) || is_kfunc_rcu(meta)) && - (register_is_null(reg) || type_may_be_null(reg->type))) { + (register_is_null(reg) || type_may_be_null(reg->type)) && + !is_kfunc_arg_nullable(meta->btf, &args[i])) { verbose(env, "Possibly NULL pointer passed to trusted arg%d\n", i); return -EACCES; } @@ -11560,6 +11569,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ return kf_arg_type; switch (kf_arg_type) { + case KF_ARG_PTR_TO_NULL: + continue; case KF_ARG_PTR_TO_ALLOC_BTF_ID: case KF_ARG_PTR_TO_BTF_ID: if (!is_kfunc_trusted_args(meta) && !is_kfunc_rcu(meta)) -- 2.34.1
From: Chuyi Zhou <zhouchuyi@bytedance.com> mainline inclusion from mainline-v6.7-rc1 commit ddab78cbb52f81f7f7598482602342955b2ff8b8 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- The newly-added struct bpf_iter_task has a name collision with a selftest for the seq_file task iter's bpf skel, so the selftests/bpf/progs file is renamed in order to avoid the collision. Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20231018061746.111364-8-zhouchuyi@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/prog_tests/bpf_iter.c | 18 +++++++++--------- .../{bpf_iter_task.c => bpf_iter_tasks.c} | 0 2 files changed, 9 insertions(+), 9 deletions(-) rename tools/testing/selftests/bpf/progs/{bpf_iter_task.c => bpf_iter_tasks.c} (100%) diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c index caea3c500cc0..2cacc8fa9697 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c @@ -7,7 +7,7 @@ #include "bpf_iter_ipv6_route.skel.h" #include "bpf_iter_netlink.skel.h" #include "bpf_iter_bpf_map.skel.h" -#include "bpf_iter_task.skel.h" +#include "bpf_iter_tasks.skel.h" #include "bpf_iter_task_stack.skel.h" #include "bpf_iter_task_file.skel.h" #include "bpf_iter_task_vmas.skel.h" @@ -215,12 +215,12 @@ static void *do_nothing_wait(void *arg) static void test_task_common_nocheck(struct bpf_iter_attach_opts *opts, int *num_unknown, int *num_known) { - struct bpf_iter_task *skel; + struct bpf_iter_tasks *skel; pthread_t thread_id; void *ret; - skel = bpf_iter_task__open_and_load(); - if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load")) + skel = bpf_iter_tasks__open_and_load(); + if (!ASSERT_OK_PTR(skel, "bpf_iter_tasks__open_and_load")) return; ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), "pthread_mutex_lock"); @@ -239,7 +239,7 @@ static void test_task_common_nocheck(struct bpf_iter_attach_opts *opts, ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL, "pthread_join"); - bpf_iter_task__destroy(skel); + bpf_iter_tasks__destroy(skel); } static void test_task_common(struct bpf_iter_attach_opts *opts, int num_unknown, int num_known) @@ -307,10 +307,10 @@ static void test_task_pidfd(void) static void test_task_sleepable(void) { - struct bpf_iter_task *skel; + struct bpf_iter_tasks *skel; - skel = bpf_iter_task__open_and_load(); - if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load")) + skel = bpf_iter_tasks__open_and_load(); + if (!ASSERT_OK_PTR(skel, "bpf_iter_tasks__open_and_load")) return; do_dummy_read(skel->progs.dump_task_sleepable); @@ -320,7 +320,7 @@ static void test_task_sleepable(void) ASSERT_GT(skel->bss->num_success_copy_from_user_task, 0, "num_success_copy_from_user_task"); - bpf_iter_task__destroy(skel); + bpf_iter_tasks__destroy(skel); } static void test_task_stack(void) diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task.c b/tools/testing/selftests/bpf/progs/bpf_iter_tasks.c similarity index 100% rename from tools/testing/selftests/bpf/progs/bpf_iter_task.c rename to tools/testing/selftests/bpf/progs/bpf_iter_tasks.c -- 2.34.1
From: Chuyi Zhou <zhouchuyi@bytedance.com> mainline inclusion from mainline-v6.7-rc1 commit 130e0f7af9fc7388b90fb016ca13ff3840c48d4a category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- This patch adds 4 subtests to demonstrate these patterns and validating correctness. subtest1: 1) We use task_iter to iterate all process in the system and search for the current process with a given pid. 2) We create some threads in current process context, and use BPF_TASK_ITER_PROC_THREADS to iterate all threads of current process. As expected, we would find all the threads of current process. 3) We create some threads and use BPF_TASK_ITER_ALL_THREADS to iterate all threads in the system. As expected, we would find all the threads which was created. subtest2: We create a cgroup and add the current task to the cgroup. In the BPF program, we would use bpf_for_each(css_task, task, css) to iterate all tasks under the cgroup. As expected, we would find the current process. subtest3: 1) We create a cgroup tree. In the BPF program, we use bpf_for_each(css, pos, root, XXX) to iterate all descendant under the root with pre and post order. As expected, we would find all descendant and the last iterating cgroup in post-order is root cgroup, the first iterating cgroup in pre-order is root cgroup. 2) We wse BPF_CGROUP_ITER_ANCESTORS_UP to traverse the cgroup tree starting from leaf and root separately, and record the height. The diff of the hights would be the total tree-high - 1. subtest4: Add some failure testcase when using css_task, task and css iters, e.g, unlock when using task-iters to iterate tasks. Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> Link: https://lore.kernel.org/r/20231018061746.111364-9-zhouchuyi@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../testing/selftests/bpf/prog_tests/iters.c | 150 ++++++++++++++++++ tools/testing/selftests/bpf/progs/iters_css.c | 72 +++++++++ .../selftests/bpf/progs/iters_css_task.c | 47 ++++++ .../testing/selftests/bpf/progs/iters_task.c | 41 +++++ .../selftests/bpf/progs/iters_task_failure.c | 105 ++++++++++++ 5 files changed, 415 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/iters_css.c create mode 100644 tools/testing/selftests/bpf/progs/iters_css_task.c create mode 100644 tools/testing/selftests/bpf/progs/iters_task.c create mode 100644 tools/testing/selftests/bpf/progs/iters_task_failure.c diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c index b696873c5455..c2425791c923 100644 --- a/tools/testing/selftests/bpf/prog_tests/iters.c +++ b/tools/testing/selftests/bpf/prog_tests/iters.c @@ -1,7 +1,14 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */ +#include <sys/syscall.h> +#include <sys/mman.h> +#include <sys/wait.h> +#include <unistd.h> +#include <malloc.h> +#include <stdlib.h> #include <test_progs.h> +#include "cgroup_helpers.h" #include "iters.skel.h" #include "iters_state_safety.skel.h" @@ -9,6 +16,10 @@ #include "iters_num.skel.h" #include "iters_testmod_seq.skel.h" #include "iters_task_vma.skel.h" +#include "iters_task.skel.h" +#include "iters_css_task.skel.h" +#include "iters_css.skel.h" +#include "iters_task_failure.skel.h" static void subtest_num_iters(void) { @@ -146,6 +157,138 @@ static void subtest_task_vma_iters(void) iters_task_vma__destroy(skel); } +static pthread_mutex_t do_nothing_mutex; + +static void *do_nothing_wait(void *arg) +{ + pthread_mutex_lock(&do_nothing_mutex); + pthread_mutex_unlock(&do_nothing_mutex); + + pthread_exit(arg); +} + +#define thread_num 2 + +static void subtest_task_iters(void) +{ + struct iters_task *skel = NULL; + pthread_t thread_ids[thread_num]; + void *ret; + int err; + + skel = iters_task__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open_and_load")) + goto cleanup; + skel->bss->target_pid = getpid(); + err = iters_task__attach(skel); + if (!ASSERT_OK(err, "iters_task__attach")) + goto cleanup; + pthread_mutex_lock(&do_nothing_mutex); + for (int i = 0; i < thread_num; i++) + ASSERT_OK(pthread_create(&thread_ids[i], NULL, &do_nothing_wait, NULL), + "pthread_create"); + + syscall(SYS_getpgid); + iters_task__detach(skel); + ASSERT_EQ(skel->bss->procs_cnt, 1, "procs_cnt"); + ASSERT_EQ(skel->bss->threads_cnt, thread_num + 1, "threads_cnt"); + ASSERT_EQ(skel->bss->proc_threads_cnt, thread_num + 1, "proc_threads_cnt"); + pthread_mutex_unlock(&do_nothing_mutex); + for (int i = 0; i < thread_num; i++) + ASSERT_OK(pthread_join(thread_ids[i], &ret), "pthread_join"); +cleanup: + iters_task__destroy(skel); +} + +extern int stack_mprotect(void); + +static void subtest_css_task_iters(void) +{ + struct iters_css_task *skel = NULL; + int err, cg_fd, cg_id; + const char *cgrp_path = "/cg1"; + + err = setup_cgroup_environment(); + if (!ASSERT_OK(err, "setup_cgroup_environment")) + goto cleanup; + cg_fd = create_and_get_cgroup(cgrp_path); + if (!ASSERT_GE(cg_fd, 0, "create_and_get_cgroup")) + goto cleanup; + cg_id = get_cgroup_id(cgrp_path); + err = join_cgroup(cgrp_path); + if (!ASSERT_OK(err, "join_cgroup")) + goto cleanup; + + skel = iters_css_task__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open_and_load")) + goto cleanup; + + skel->bss->target_pid = getpid(); + skel->bss->cg_id = cg_id; + err = iters_css_task__attach(skel); + if (!ASSERT_OK(err, "iters_task__attach")) + goto cleanup; + err = stack_mprotect(); + if (!ASSERT_EQ(err, -1, "stack_mprotect") || + !ASSERT_EQ(errno, EPERM, "stack_mprotect")) + goto cleanup; + iters_css_task__detach(skel); + ASSERT_EQ(skel->bss->css_task_cnt, 1, "css_task_cnt"); + +cleanup: + cleanup_cgroup_environment(); + iters_css_task__destroy(skel); +} + +static void subtest_css_iters(void) +{ + struct iters_css *skel = NULL; + struct { + const char *path; + int fd; + } cgs[] = { + { "/cg1" }, + { "/cg1/cg2" }, + { "/cg1/cg2/cg3" }, + { "/cg1/cg2/cg3/cg4" }, + }; + int err, cg_nr = ARRAY_SIZE(cgs); + int i; + + err = setup_cgroup_environment(); + if (!ASSERT_OK(err, "setup_cgroup_environment")) + goto cleanup; + for (i = 0; i < cg_nr; i++) { + cgs[i].fd = create_and_get_cgroup(cgs[i].path); + if (!ASSERT_GE(cgs[i].fd, 0, "create_and_get_cgroup")) + goto cleanup; + } + + skel = iters_css__open_and_load(); + if (!ASSERT_OK_PTR(skel, "open_and_load")) + goto cleanup; + + skel->bss->target_pid = getpid(); + skel->bss->root_cg_id = get_cgroup_id(cgs[0].path); + skel->bss->leaf_cg_id = get_cgroup_id(cgs[cg_nr - 1].path); + err = iters_css__attach(skel); + + if (!ASSERT_OK(err, "iters_task__attach")) + goto cleanup; + + syscall(SYS_getpgid); + ASSERT_EQ(skel->bss->pre_order_cnt, cg_nr, "pre_order_cnt"); + ASSERT_EQ(skel->bss->first_cg_id, get_cgroup_id(cgs[0].path), "first_cg_id"); + + ASSERT_EQ(skel->bss->post_order_cnt, cg_nr, "post_order_cnt"); + ASSERT_EQ(skel->bss->last_cg_id, get_cgroup_id(cgs[0].path), "last_cg_id"); + ASSERT_EQ(skel->bss->tree_high, cg_nr - 1, "tree_high"); + iters_css__detach(skel); +cleanup: + cleanup_cgroup_environment(); + iters_css__destroy(skel); +} + void test_iters(void) { RUN_TESTS(iters_state_safety); @@ -161,4 +304,11 @@ void test_iters(void) subtest_testmod_seq_iters(); if (test__start_subtest("task_vma")) subtest_task_vma_iters(); + if (test__start_subtest("task")) + subtest_task_iters(); + if (test__start_subtest("css_task")) + subtest_css_task_iters(); + if (test__start_subtest("css")) + subtest_css_iters(); + RUN_TESTS(iters_task_failure); } diff --git a/tools/testing/selftests/bpf/progs/iters_css.c b/tools/testing/selftests/bpf/progs/iters_css.c new file mode 100644 index 000000000000..ec1f6c2f590b --- /dev/null +++ b/tools/testing/selftests/bpf/progs/iters_css.c @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2023 Chuyi Zhou <zhouchuyi@bytedance.com> */ + +#include "vmlinux.h" +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> +#include "bpf_misc.h" +#include "bpf_experimental.h" + +char _license[] SEC("license") = "GPL"; + +pid_t target_pid; +u64 root_cg_id, leaf_cg_id; +u64 first_cg_id, last_cg_id; + +int pre_order_cnt, post_order_cnt, tree_high; + +struct cgroup *bpf_cgroup_from_id(u64 cgid) __ksym; +void bpf_cgroup_release(struct cgroup *p) __ksym; +void bpf_rcu_read_lock(void) __ksym; +void bpf_rcu_read_unlock(void) __ksym; + +SEC("fentry.s/" SYS_PREFIX "sys_getpgid") +int iter_css_for_each(const void *ctx) +{ + struct task_struct *cur_task = bpf_get_current_task_btf(); + struct cgroup_subsys_state *root_css, *leaf_css, *pos; + struct cgroup *root_cgrp, *leaf_cgrp, *cur_cgrp; + + if (cur_task->pid != target_pid) + return 0; + + root_cgrp = bpf_cgroup_from_id(root_cg_id); + + if (!root_cgrp) + return 0; + + leaf_cgrp = bpf_cgroup_from_id(leaf_cg_id); + + if (!leaf_cgrp) { + bpf_cgroup_release(root_cgrp); + return 0; + } + root_css = &root_cgrp->self; + leaf_css = &leaf_cgrp->self; + pre_order_cnt = post_order_cnt = tree_high = 0; + first_cg_id = last_cg_id = 0; + + bpf_rcu_read_lock(); + bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_DESCENDANTS_POST) { + cur_cgrp = pos->cgroup; + post_order_cnt++; + last_cg_id = cur_cgrp->kn->id; + } + + bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_DESCENDANTS_PRE) { + cur_cgrp = pos->cgroup; + pre_order_cnt++; + if (!first_cg_id) + first_cg_id = cur_cgrp->kn->id; + } + + bpf_for_each(css, pos, leaf_css, BPF_CGROUP_ITER_ANCESTORS_UP) + tree_high++; + + bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_ANCESTORS_UP) + tree_high--; + bpf_rcu_read_unlock(); + bpf_cgroup_release(root_cgrp); + bpf_cgroup_release(leaf_cgrp); + return 0; +} diff --git a/tools/testing/selftests/bpf/progs/iters_css_task.c b/tools/testing/selftests/bpf/progs/iters_css_task.c new file mode 100644 index 000000000000..5089ce384a1c --- /dev/null +++ b/tools/testing/selftests/bpf/progs/iters_css_task.c @@ -0,0 +1,47 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2023 Chuyi Zhou <zhouchuyi@bytedance.com> */ + +#include "vmlinux.h" +#include <errno.h> +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> +#include "bpf_misc.h" +#include "bpf_experimental.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_from_id(u64 cgid) __ksym; +void bpf_cgroup_release(struct cgroup *p) __ksym; + +pid_t target_pid; +int css_task_cnt; +u64 cg_id; + +SEC("lsm/file_mprotect") +int BPF_PROG(iter_css_task_for_each, struct vm_area_struct *vma, + unsigned long reqprot, unsigned long prot, int ret) +{ + struct task_struct *cur_task = bpf_get_current_task_btf(); + struct cgroup_subsys_state *css; + struct task_struct *task; + struct cgroup *cgrp; + + if (cur_task->pid != target_pid) + return ret; + + cgrp = bpf_cgroup_from_id(cg_id); + + if (!cgrp) + return -EPERM; + + css = &cgrp->self; + css_task_cnt = 0; + + bpf_for_each(css_task, task, css, CSS_TASK_ITER_PROCS) + if (task->pid == target_pid) + css_task_cnt++; + + bpf_cgroup_release(cgrp); + + return -EPERM; +} diff --git a/tools/testing/selftests/bpf/progs/iters_task.c b/tools/testing/selftests/bpf/progs/iters_task.c new file mode 100644 index 000000000000..c9b4055cd410 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/iters_task.c @@ -0,0 +1,41 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2023 Chuyi Zhou <zhouchuyi@bytedance.com> */ + +#include "vmlinux.h" +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> +#include "bpf_misc.h" +#include "bpf_experimental.h" + +char _license[] SEC("license") = "GPL"; + +pid_t target_pid; +int procs_cnt, threads_cnt, proc_threads_cnt; + +void bpf_rcu_read_lock(void) __ksym; +void bpf_rcu_read_unlock(void) __ksym; + +SEC("fentry.s/" SYS_PREFIX "sys_getpgid") +int iter_task_for_each_sleep(void *ctx) +{ + struct task_struct *cur_task = bpf_get_current_task_btf(); + struct task_struct *pos; + + if (cur_task->pid != target_pid) + return 0; + procs_cnt = threads_cnt = proc_threads_cnt = 0; + + bpf_rcu_read_lock(); + bpf_for_each(task, pos, NULL, BPF_TASK_ITER_ALL_PROCS) + if (pos->pid == target_pid) + procs_cnt++; + + bpf_for_each(task, pos, cur_task, BPF_TASK_ITER_PROC_THREADS) + proc_threads_cnt++; + + bpf_for_each(task, pos, NULL, BPF_TASK_ITER_ALL_THREADS) + if (pos->tgid == target_pid) + threads_cnt++; + bpf_rcu_read_unlock(); + return 0; +} diff --git a/tools/testing/selftests/bpf/progs/iters_task_failure.c b/tools/testing/selftests/bpf/progs/iters_task_failure.c new file mode 100644 index 000000000000..c3bf96a67dba --- /dev/null +++ b/tools/testing/selftests/bpf/progs/iters_task_failure.c @@ -0,0 +1,105 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2023 Chuyi Zhou <zhouchuyi@bytedance.com> */ + +#include "vmlinux.h" +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> +#include "bpf_misc.h" +#include "bpf_experimental.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_from_id(u64 cgid) __ksym; +void bpf_cgroup_release(struct cgroup *p) __ksym; +void bpf_rcu_read_lock(void) __ksym; +void bpf_rcu_read_unlock(void) __ksym; + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +__failure __msg("expected an RCU CS when using bpf_iter_task_next") +int BPF_PROG(iter_tasks_without_lock) +{ + struct task_struct *pos; + + bpf_for_each(task, pos, NULL, BPF_TASK_ITER_ALL_PROCS) { + + } + return 0; +} + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +__failure __msg("expected an RCU CS when using bpf_iter_css_next") +int BPF_PROG(iter_css_without_lock) +{ + u64 cg_id = bpf_get_current_cgroup_id(); + struct cgroup *cgrp = bpf_cgroup_from_id(cg_id); + struct cgroup_subsys_state *root_css, *pos; + + if (!cgrp) + return 0; + root_css = &cgrp->self; + + bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_DESCENDANTS_POST) { + + } + bpf_cgroup_release(cgrp); + return 0; +} + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +__failure __msg("expected an RCU CS when using bpf_iter_task_next") +int BPF_PROG(iter_tasks_lock_and_unlock) +{ + struct task_struct *pos; + + bpf_rcu_read_lock(); + bpf_for_each(task, pos, NULL, BPF_TASK_ITER_ALL_PROCS) { + bpf_rcu_read_unlock(); + + bpf_rcu_read_lock(); + } + bpf_rcu_read_unlock(); + return 0; +} + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +__failure __msg("expected an RCU CS when using bpf_iter_css_next") +int BPF_PROG(iter_css_lock_and_unlock) +{ + u64 cg_id = bpf_get_current_cgroup_id(); + struct cgroup *cgrp = bpf_cgroup_from_id(cg_id); + struct cgroup_subsys_state *root_css, *pos; + + if (!cgrp) + return 0; + root_css = &cgrp->self; + + bpf_rcu_read_lock(); + bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_DESCENDANTS_POST) { + bpf_rcu_read_unlock(); + + bpf_rcu_read_lock(); + } + bpf_rcu_read_unlock(); + bpf_cgroup_release(cgrp); + return 0; +} + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +__failure __msg("css_task_iter is only allowed in bpf_lsm and bpf iter-s") +int BPF_PROG(iter_css_task_for_each) +{ + u64 cg_id = bpf_get_current_cgroup_id(); + struct cgroup *cgrp = bpf_cgroup_from_id(cg_id); + struct cgroup_subsys_state *css; + struct task_struct *task; + + if (cgrp == NULL) + return 0; + css = &cgrp->self; + + bpf_for_each(css_task, task, css, CSS_TASK_ITER_PROCS) { + + } + bpf_cgroup_release(cgrp); + return 0; +} -- 2.34.1
From: Masahiro Yamada <masahiroy@kernel.org> mainline inclusion from mainline-v6.7-rc1 commit 72d091846de935e0942a8a0f1fe24ff739d85d76 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- scripts/pahole-flags.sh is executed so many times. You can confirm it, as follows: $ cat <<EOF >> scripts/pahole-flags.sh
echo "scripts/pahole-flags.sh was executed" >&2 EOF
$ make -s scripts/pahole-flags.sh was executed scripts/pahole-flags.sh was executed scripts/pahole-flags.sh was executed scripts/pahole-flags.sh was executed scripts/pahole-flags.sh was executed [ lots of repeated lines... ] This scripts is executed more than 20 times during the kernel build because PAHOLE_FLAGS is a recursively expanded variable and exported to sub-processes. With GNU Make >= 4.4, it is executed more than 60 times because exported variables are also passed to other $(shell ) invocations. Without careful coding, it is known to cause an exponential fork explosion. [1] The use of $(shell ) in an exported recursive variable is likely wrong because $(shell ) is always evaluated due to the 'export' keyword, and the evaluation can occur multiple times by the nature of recursive variables. Convert the shell script to a Makefile, which is included only when CONFIG_DEBUG_INFO_BTF=y. [1]: https://savannah.gnu.org/bugs/index.php?64746 Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Tested-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Nicolas Schier <n.schier@avm.de> Tested-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- MAINTAINERS | 2 +- Makefile | 4 +--- scripts/Makefile.btf | 19 +++++++++++++++++++ scripts/pahole-flags.sh | 30 ------------------------------ 4 files changed, 21 insertions(+), 34 deletions(-) create mode 100644 scripts/Makefile.btf delete mode 100755 scripts/pahole-flags.sh diff --git a/MAINTAINERS b/MAINTAINERS index 6358190b367d..ae946c81c58b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3774,7 +3774,7 @@ F: net/sched/act_bpf.c F: net/sched/cls_bpf.c F: samples/bpf/ F: scripts/bpf_doc.py -F: scripts/pahole-flags.sh +F: scripts/Makefile.btf F: scripts/pahole-version.sh F: tools/bpf/ F: tools/lib/bpf/ diff --git a/Makefile b/Makefile index 7c4955cf5f71..b62b39c18643 100644 --- a/Makefile +++ b/Makefile @@ -522,8 +522,6 @@ LZ4 = lz4 XZ = xz ZSTD = zstd -PAHOLE_FLAGS = $(shell PAHOLE=$(PAHOLE) $(srctree)/scripts/pahole-flags.sh) - CHECKFLAGS := -D__linux__ -Dlinux -D__STDC__ -Dunix -D__unix__ \ -Wbitwise -Wno-return-void -Wno-unknown-attribute $(CF) NOSTDINC_FLAGS := @@ -614,7 +612,6 @@ export KBUILD_RUSTFLAGS RUSTFLAGS_KERNEL RUSTFLAGS_MODULE export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_RUSTFLAGS_MODULE KBUILD_LDFLAGS_MODULE export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL KBUILD_RUSTFLAGS_KERNEL -export PAHOLE_FLAGS # Files to ignore in find ... statements @@ -1015,6 +1012,7 @@ KBUILD_CPPFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=) # include additional Makefiles when needed include-y := scripts/Makefile.extrawarn include-$(CONFIG_DEBUG_INFO) += scripts/Makefile.debug +include-$(CONFIG_DEBUG_INFO_BTF)+= scripts/Makefile.btf include-$(CONFIG_KASAN) += scripts/Makefile.kasan include-$(CONFIG_KCSAN) += scripts/Makefile.kcsan include-$(CONFIG_KMSAN) += scripts/Makefile.kmsan diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf new file mode 100644 index 000000000000..82377e470aed --- /dev/null +++ b/scripts/Makefile.btf @@ -0,0 +1,19 @@ +# SPDX-License-Identifier: GPL-2.0 + +pahole-ver := $(CONFIG_PAHOLE_VERSION) +pahole-flags-y := + +# pahole 1.18 through 1.21 can't handle zero-sized per-CPU vars +ifeq ($(call test-le, $(pahole-ver), 121),y) +pahole-flags-$(call test-ge, $(pahole-ver), 118) += --skip_encoding_btf_vars +endif + +pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats + +pahole-flags-$(call test-ge, $(pahole-ver), 122) += -j + +pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE) += --lang_exclude=rust + +pahole-flags-$(call test-ge, $(pahole-ver), 125) += --skip_encoding_btf_inconsistent_proto --btf_gen_optimized + +export PAHOLE_FLAGS := $(pahole-flags-y) diff --git a/scripts/pahole-flags.sh b/scripts/pahole-flags.sh deleted file mode 100755 index 728d55190d97..000000000000 --- a/scripts/pahole-flags.sh +++ /dev/null @@ -1,30 +0,0 @@ -#!/bin/sh -# SPDX-License-Identifier: GPL-2.0 - -extra_paholeopt= - -if ! [ -x "$(command -v ${PAHOLE})" ]; then - exit 0 -fi - -pahole_ver=$($(dirname $0)/pahole-version.sh ${PAHOLE}) - -if [ "${pahole_ver}" -ge "118" ] && [ "${pahole_ver}" -le "121" ]; then - # pahole 1.18 through 1.21 can't handle zero-sized per-CPU vars - extra_paholeopt="${extra_paholeopt} --skip_encoding_btf_vars" -fi -if [ "${pahole_ver}" -ge "121" ]; then - extra_paholeopt="${extra_paholeopt} --btf_gen_floats" -fi -if [ "${pahole_ver}" -ge "122" ]; then - extra_paholeopt="${extra_paholeopt} -j" -fi -if [ "${pahole_ver}" -ge "124" ]; then - # see PAHOLE_HAS_LANG_EXCLUDE - extra_paholeopt="${extra_paholeopt} --lang_exclude=rust" -fi -if [ "${pahole_ver}" -ge "125" ]; then - extra_paholeopt="${extra_paholeopt} --skip_encoding_btf_inconsistent_proto --btf_gen_optimized" -fi - -echo ${extra_paholeopt} -- 2.34.1
From: Matthieu Baerts <matttbe@kernel.org> mainline inclusion from mainline-v6.7-rc1 commit 05670f81d1287c40ec861186e4c4e3401013e7fb category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Our MPTCP CI complained [1] -- and KBuild too -- that it was no longer possible to build the kernel without CONFIG_CGROUPS: kernel/bpf/task_iter.c: In function 'bpf_iter_css_task_new': kernel/bpf/task_iter.c:919:14: error: 'CSS_TASK_ITER_PROCS' undeclared (first use in this function) 919 | case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED: | ^~~~~~~~~~~~~~~~~~~ kernel/bpf/task_iter.c:919:14: note: each undeclared identifier is reported only once for each function it appears in kernel/bpf/task_iter.c:919:36: error: 'CSS_TASK_ITER_THREADED' undeclared (first use in this function) 919 | case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED: | ^~~~~~~~~~~~~~~~~~~~~~ kernel/bpf/task_iter.c:927:60: error: invalid application of 'sizeof' to incomplete type 'struct css_task_iter' 927 | kit->css_it = bpf_mem_alloc(&bpf_global_ma, sizeof(struct css_task_iter)); | ^~~~~~ kernel/bpf/task_iter.c:930:9: error: implicit declaration of function 'css_task_iter_start'; did you mean 'task_seq_start'? [-Werror=implicit-function-declaration] 930 | css_task_iter_start(css, flags, kit->css_it); | ^~~~~~~~~~~~~~~~~~~ | task_seq_start kernel/bpf/task_iter.c: In function 'bpf_iter_css_task_next': kernel/bpf/task_iter.c:940:16: error: implicit declaration of function 'css_task_iter_next'; did you mean 'class_dev_iter_next'? [-Werror=implicit-function-declaration] 940 | return css_task_iter_next(kit->css_it); | ^~~~~~~~~~~~~~~~~~ | class_dev_iter_next kernel/bpf/task_iter.c:940:16: error: returning 'int' from a function with return type 'struct task_struct *' makes pointer from integer without a cast [-Werror=int-conversion] 940 | return css_task_iter_next(kit->css_it); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/bpf/task_iter.c: In function 'bpf_iter_css_task_destroy': kernel/bpf/task_iter.c:949:9: error: implicit declaration of function 'css_task_iter_end' [-Werror=implicit-function-declaration] 949 | css_task_iter_end(kit->css_it); | ^~~~~~~~~~~~~~~~~ This patch simply surrounds with a #ifdef the new code requiring CGroups support. It seems enough for the compiler and this is similar to bpf_iter_css_{new,next,destroy}() functions where no other #ifdef have been added in kernel/bpf/helpers.c and in the selftests. Fixes: 9c66dc94b62a ("bpf: Introduce css_task open-coded iterator kfuncs") Link: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/6665206927 Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310260528.aHWgVFqq-lkp@intel.com/ Signed-off-by: Matthieu Baerts <matttbe@kernel.org> [ added missing ifdefs for BTF_ID cgroup definitions ] Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20231101181601.1493271-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Conflicts: kernel/bpf/verifier.c [Fix conflict because missing some kfuncs and hisock.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/helpers.c | 8 +++++--- kernel/bpf/task_iter.c | 4 ++++ kernel/bpf/verifier.c | 12 ++++++++++-- 3 files changed, 19 insertions(+), 5 deletions(-) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 112588659104..73e1e0b71ef3 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2674,15 +2674,17 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_iter_task_vma_new, KF_ITER_NEW | KF_RCU) BTF_ID_FLAGS(func, bpf_iter_task_vma_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_task_vma_destroy, KF_ITER_DESTROY) +#ifdef CONFIG_CGROUPS BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY) -BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED) -BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL) -BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED) BTF_ID_FLAGS(func, bpf_iter_css_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_css_destroy, KF_ITER_DESTROY) +#endif +BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED) +BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_dynptr_adjust) BTF_ID_FLAGS(func, bpf_dynptr_is_null) BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly) diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c index a663c3d11b62..e0793906bf13 100644 --- a/kernel/bpf/task_iter.c +++ b/kernel/bpf/task_iter.c @@ -914,6 +914,8 @@ __bpf_kfunc void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it) __diag_pop(); +#ifdef CONFIG_CGROUPS + struct bpf_iter_css_task { __u64 __opaque[1]; } __attribute__((aligned(8))); @@ -972,6 +974,8 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __diag_pop(); +#endif /* CONFIG_CGROUPS */ + struct bpf_iter_task { __u64 __opaque[3]; } __attribute__((aligned(8))); diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index b0e078f2d9c4..fb230e9a5459 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5420,7 +5420,9 @@ static bool in_rcu_cs(struct bpf_verifier_env *env) /* Once GCC supports btf_type_tag the following mechanism will be replaced with tag check */ BTF_SET_START(rcu_protected_types) BTF_ID(struct, prog_test_ref_kfunc) +#ifdef CONFIG_CGROUPS BTF_ID(struct, cgroup) +#endif BTF_ID(struct, bpf_cpumask) BTF_ID(struct, task_struct) BTF_SET_END(rcu_protected_types) @@ -10877,7 +10879,6 @@ BTF_ID(func, bpf_dynptr_from_xdp) BTF_ID(func, bpf_dynptr_slice) BTF_ID(func, bpf_dynptr_slice_rdwr) BTF_ID(func, bpf_dynptr_clone) -BTF_ID(func, bpf_iter_css_task_new) #ifdef CONFIG_HISOCK BTF_ID(func, bpf_set_ingress_dst) BTF_ID(func, bpf_set_ingress_dev) @@ -10886,6 +10887,9 @@ BTF_ID(func, bpf_get_skb_ethhdr) BTF_ID(func, bpf_handle_ingress_ptype) BTF_ID(func, bpf_handle_egress_ptype) #endif +#ifdef CONFIG_CGROUPS +BTF_ID(func, bpf_iter_css_task_new) +#endif BTF_SET_END(special_kfunc_set) BTF_ID_LIST(special_kfunc_list) @@ -10908,7 +10912,6 @@ BTF_ID(func, bpf_dynptr_from_xdp) BTF_ID(func, bpf_dynptr_slice) BTF_ID(func, bpf_dynptr_slice_rdwr) BTF_ID(func, bpf_dynptr_clone) -BTF_ID(func, bpf_iter_css_task_new) #ifdef CONFIG_HISOCK BTF_ID(func, bpf_set_ingress_dst) BTF_ID(func, bpf_set_ingress_dev) @@ -10917,6 +10920,11 @@ BTF_ID(func, bpf_get_skb_ethhdr) BTF_ID(func, bpf_handle_ingress_ptype) BTF_ID(func, bpf_handle_egress_ptype) #endif +#ifdef CONFIG_CGROUPS +BTF_ID(func, bpf_iter_css_task_new) +#else +BTF_ID_UNUSED +#endif static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta) { -- 2.34.1
From: Dave Marchevsky <davemarchevsky@fb.com> mainline inclusion from mainline-v6.7-rc1 commit 391145ba2accc48b596f3d438af1a6255b62a555 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- BPF kfuncs are meant to be called from BPF programs. Accordingly, most kfuncs are not called from anywhere in the kernel, which the -Wmissing-prototypes warning is unhappy about. We've peppered __diag_ignore_all("-Wmissing-prototypes", ... everywhere kfuncs are defined in the codebase to suppress this warning. This patch adds two macros meant to bound one or many kfunc definitions. All existing kfunc definitions which use these __diag calls to suppress -Wmissing-prototypes are migrated to use the newly-introduced macros. A new __diag_ignore_all - for "-Wmissing-declarations" - is added to the __bpf_kfunc_start_defs macro based on feedback from Andrii on an earlier version of this patch [0] and another recent mailing list thread [1]. In the future we might need to ignore different warnings or do other kfunc-specific things. This change will make it easier to make such modifications for all kfunc defs. [0]: https://lore.kernel.org/bpf/CAEf4BzaE5dRWtK6RPLnjTW-MW9sx9K3Fn6uwqCTChK2Dcb1... [1]: https://lore.kernel.org/bpf/ZT+2qCc%2FaXep0%2FLf@krava/ Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Suggested-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Cc: Jiri Olsa <olsajiri@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: David Vernet <void@manifault.com> Acked-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20231031215625.2343848-1-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Conflicts: kernel/bpf/helpers.c net/core/filter.c net/core/xdp.c [Fix conflicts because of missing some kfuncs.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- Documentation/bpf/kfuncs.rst | 6 ++---- include/linux/btf.h | 9 +++++++++ kernel/bpf/bpf_iter.c | 6 ++---- kernel/bpf/cgroup_iter.c | 6 ++---- kernel/bpf/cpumask.c | 6 ++---- kernel/bpf/helpers.c | 6 ++---- kernel/bpf/map_iter.c | 6 ++---- kernel/bpf/task_iter.c | 18 ++++++------------ kernel/trace/bpf_trace.c | 6 ++---- net/bpf/test_run.c | 7 +++---- net/core/filter.c | 13 ++++--------- net/core/xdp.c | 6 ++---- net/ipv4/fou_bpf.c | 6 ++---- net/netfilter/nf_conntrack_bpf.c | 6 ++---- net/netfilter/nf_nat_bpf.c | 6 ++---- net/xfrm/xfrm_interface_bpf.c | 6 ++---- 16 files changed, 46 insertions(+), 73 deletions(-) diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst index 0d2647fb358d..723408e399ab 100644 --- a/Documentation/bpf/kfuncs.rst +++ b/Documentation/bpf/kfuncs.rst @@ -37,16 +37,14 @@ prototype in a header for the wrapper kfunc. An example is given below:: /* Disables missing prototype warnings */ - __diag_push(); - __diag_ignore_all("-Wmissing-prototypes", - "Global kfuncs as their definitions will be in BTF"); + __bpf_kfunc_start_defs(); __bpf_kfunc struct task_struct *bpf_find_get_task_by_vpid(pid_t nr) { return find_get_task_by_vpid(nr); } - __diag_pop(); + __bpf_kfunc_end_defs(); A wrapper kfunc is often needed when we need to annotate parameters of the kfunc. Otherwise one may directly make the kfunc visible to the BPF program by diff --git a/include/linux/btf.h b/include/linux/btf.h index c2231c64d60b..dc5ce962f600 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -84,6 +84,15 @@ */ #define __bpf_kfunc __used noinline +#define __bpf_kfunc_start_defs() \ + __diag_push(); \ + __diag_ignore_all("-Wmissing-declarations", \ + "Global kfuncs as their definitions will be in BTF");\ + __diag_ignore_all("-Wmissing-prototypes", \ + "Global kfuncs as their definitions will be in BTF") + +#define __bpf_kfunc_end_defs() __diag_pop() + /* * Return the name of the passed struct, if exists, or halt the build if for * example the structure gets renamed. In this way, developers have to revisit diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c index 833faa04461b..0fae79164187 100644 --- a/kernel/bpf/bpf_iter.c +++ b/kernel/bpf/bpf_iter.c @@ -782,9 +782,7 @@ struct bpf_iter_num_kern { int end; /* final value, exclusive */ } __aligned(8); -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "Global functions as their definitions will be in vmlinux BTF"); +__bpf_kfunc_start_defs(); __bpf_kfunc int bpf_iter_num_new(struct bpf_iter_num *it, int start, int end) { @@ -843,4 +841,4 @@ __bpf_kfunc void bpf_iter_num_destroy(struct bpf_iter_num *it) s->cur = s->end = 0; } -__diag_pop(); +__bpf_kfunc_end_defs(); diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c index 209e5135f9fb..d1b5c5618dd7 100644 --- a/kernel/bpf/cgroup_iter.c +++ b/kernel/bpf/cgroup_iter.c @@ -305,9 +305,7 @@ struct bpf_iter_css_kern { unsigned int flags; } __attribute__((aligned(8))); -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "Global functions as their definitions will be in vmlinux BTF"); +__bpf_kfunc_start_defs(); __bpf_kfunc int bpf_iter_css_new(struct bpf_iter_css *it, struct cgroup_subsys_state *start, unsigned int flags) @@ -358,4 +356,4 @@ __bpf_kfunc void bpf_iter_css_destroy(struct bpf_iter_css *it) { } -__diag_pop(); \ No newline at end of file +__bpf_kfunc_end_defs(); diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c index 6983af8e093c..e01c741e54e7 100644 --- a/kernel/bpf/cpumask.c +++ b/kernel/bpf/cpumask.c @@ -34,9 +34,7 @@ static bool cpu_valid(u32 cpu) return cpu < nr_cpu_ids; } -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "Global kfuncs as their definitions will be in BTF"); +__bpf_kfunc_start_defs(); /** * bpf_cpumask_create() - Create a mutable BPF cpumask. @@ -407,7 +405,7 @@ __bpf_kfunc u32 bpf_cpumask_any_and_distribute(const struct cpumask *src1, return cpumask_any_and_distribute(src1, src2); } -__diag_pop(); +__bpf_kfunc_end_defs(); BTF_SET8_START(cpumask_kfunc_btf_ids) BTF_ID_FLAGS(func, bpf_cpumask_create, KF_ACQUIRE | KF_RET_NULL) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 73e1e0b71ef3..a17fee069161 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2033,9 +2033,7 @@ void bpf_rb_root_free(const struct btf_field *field, void *rb_root, } } -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "Global functions as their definitions will be in vmlinux BTF"); +__bpf_kfunc_start_defs(); __bpf_kfunc void *bpf_obj_new_impl(u64 local_type_id__k, void *meta__ign) { @@ -2615,7 +2613,7 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) rcu_read_unlock(); } -__diag_pop(); +__bpf_kfunc_end_defs(); BTF_SET8_START(generic_btf_ids) #ifdef CONFIG_KEXEC_CORE diff --git a/kernel/bpf/map_iter.c b/kernel/bpf/map_iter.c index 6fc9dae9edc8..6abd7c5df4b3 100644 --- a/kernel/bpf/map_iter.c +++ b/kernel/bpf/map_iter.c @@ -193,9 +193,7 @@ static int __init bpf_map_iter_init(void) late_initcall(bpf_map_iter_init); -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "Global functions as their definitions will be in vmlinux BTF"); +__bpf_kfunc_start_defs(); __bpf_kfunc s64 bpf_map_sum_elem_count(const struct bpf_map *map) { @@ -213,7 +211,7 @@ __bpf_kfunc s64 bpf_map_sum_elem_count(const struct bpf_map *map) return ret; } -__diag_pop(); +__bpf_kfunc_end_defs(); BTF_SET8_START(bpf_map_iter_kfunc_ids) BTF_ID_FLAGS(func, bpf_map_sum_elem_count, KF_TRUSTED_ARGS) diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c index e0793906bf13..ea8d2cf3f52c 100644 --- a/kernel/bpf/task_iter.c +++ b/kernel/bpf/task_iter.c @@ -844,9 +844,7 @@ struct bpf_iter_task_vma_kern { struct bpf_iter_task_vma_kern_data *data; } __attribute__((aligned(8))); -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "Global functions as their definitions will be in vmlinux BTF"); +__bpf_kfunc_start_defs(); __bpf_kfunc int bpf_iter_task_vma_new(struct bpf_iter_task_vma *it, struct task_struct *task, u64 addr) @@ -912,7 +910,7 @@ __bpf_kfunc void bpf_iter_task_vma_destroy(struct bpf_iter_task_vma *it) } } -__diag_pop(); +__bpf_kfunc_end_defs(); #ifdef CONFIG_CGROUPS @@ -924,9 +922,7 @@ struct bpf_iter_css_task_kern { struct css_task_iter *css_it; } __attribute__((aligned(8))); -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "Global functions as their definitions will be in vmlinux BTF"); +__bpf_kfunc_start_defs(); __bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it, struct cgroup_subsys_state *css, unsigned int flags) @@ -972,7 +968,7 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) bpf_mem_free(&bpf_global_ma, kit->css_it); } -__diag_pop(); +__bpf_kfunc_end_defs(); #endif /* CONFIG_CGROUPS */ @@ -995,9 +991,7 @@ enum { BPF_TASK_ITER_PROC_THREADS }; -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "Global functions as their definitions will be in vmlinux BTF"); +__bpf_kfunc_start_defs(); __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task__nullable, unsigned int flags) @@ -1067,7 +1061,7 @@ __bpf_kfunc void bpf_iter_task_destroy(struct bpf_iter_task *it) { } -__diag_pop(); +__bpf_kfunc_end_defs(); DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work); diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 50556b239023..357c7febd8ba 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1256,9 +1256,7 @@ static const struct bpf_func_proto bpf_get_func_arg_cnt_proto = { }; #ifdef CONFIG_KEYS -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "kfuncs which will be used in BPF programs"); +__bpf_kfunc_start_defs(); /** * bpf_lookup_user_key - lookup a key by its serial @@ -1408,7 +1406,7 @@ __bpf_kfunc int bpf_verify_pkcs7_signature(struct bpf_dynptr_kern *data_ptr, } #endif /* CONFIG_SYSTEM_DATA_VERIFICATION */ -__diag_pop(); +__bpf_kfunc_end_defs(); BTF_SET8_START(key_sig_kfunc_set) BTF_ID_FLAGS(func, bpf_lookup_user_key, KF_ACQUIRE | KF_RET_NULL | KF_SLEEPABLE) diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 3a1c82b797a3..6ccb767169fc 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -504,9 +504,8 @@ static int bpf_test_finish(const union bpf_attr *kattr, * architecture dependent calling conventions. 7+ can be supported in the * future. */ -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "Global functions as their definitions will be in vmlinux BTF"); +__bpf_kfunc_start_defs(); + __bpf_kfunc int bpf_fentry_test1(int a) { return a + 1; @@ -606,7 +605,7 @@ __bpf_kfunc void bpf_kfunc_call_memb_release(struct prog_test_member *p) { } -__diag_pop(); +__bpf_kfunc_end_defs(); BTF_SET8_START(bpf_test_modify_return_ids) BTF_ID_FLAGS(func, bpf_modify_return_test) diff --git a/net/core/filter.c b/net/core/filter.c index f91de361383a..fe54ad9b0e8f 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -12103,9 +12103,7 @@ bpf_sk_base_func_proto(enum bpf_func_id func_id) return func; } -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "Global functions as their definitions will be in vmlinux BTF"); +__bpf_kfunc_start_defs(); __bpf_kfunc int bpf_dynptr_from_skb(struct sk_buff *skb, u64 flags, struct bpf_dynptr_kern *ptr__uninit) { @@ -12258,7 +12256,7 @@ __bpf_kfunc void bpf_handle_egress_ptype(struct __sk_buff *skb_ctx) rcu_read_unlock(); } #endif -__diag_pop(); +__bpf_kfunc_end_defs(); int bpf_dynptr_from_skb_rdonly(struct sk_buff *skb, u64 flags, struct bpf_dynptr_kern *ptr__uninit) @@ -12342,10 +12340,7 @@ static int __init bpf_kfunc_init(void) } late_initcall(bpf_kfunc_init); -/* Disables missing prototype warnings */ -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "Global functions as their definitions will be in vmlinux BTF"); +__bpf_kfunc_start_defs(); /* bpf_sock_destroy: Destroy the given socket with ECONNABORTED error code. * @@ -12379,7 +12374,7 @@ __bpf_kfunc int bpf_sock_destroy(struct sock_common *sock) return sk->sk_prot->diag_destroy(sk, ECONNABORTED); } -__diag_pop() +__bpf_kfunc_end_defs(); BTF_SET8_START(bpf_sk_iter_kfunc_ids) BTF_ID_FLAGS(func, bpf_sock_destroy, KF_TRUSTED_ARGS) diff --git a/net/core/xdp.c b/net/core/xdp.c index 5ee3f8f165e5..1642222e350b 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -692,9 +692,7 @@ struct xdp_frame *xdpf_clone(struct xdp_frame *xdpf) return nxdpf; } -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "Global functions as their definitions will be in vmlinux BTF"); +__bpf_kfunc_start_defs(); /** * bpf_xdp_metadata_rx_timestamp - Read XDP frame RX timestamp. @@ -734,7 +732,7 @@ __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash, return -EOPNOTSUPP; } -__diag_pop(); +__bpf_kfunc_end_defs(); BTF_SET8_START(xdp_metadata_kfunc_ids) #define XDP_METADATA_KFUNC(_, name) BTF_ID_FLAGS(func, name, KF_TRUSTED_ARGS) diff --git a/net/ipv4/fou_bpf.c b/net/ipv4/fou_bpf.c index 3760a14b6b57..4da03bf45c9b 100644 --- a/net/ipv4/fou_bpf.c +++ b/net/ipv4/fou_bpf.c @@ -22,9 +22,7 @@ enum bpf_fou_encap_type { FOU_BPF_ENCAP_GUE, }; -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "Global functions as their definitions will be in BTF"); +__bpf_kfunc_start_defs(); /* bpf_skb_set_fou_encap - Set FOU encap parameters * @@ -100,7 +98,7 @@ __bpf_kfunc int bpf_skb_get_fou_encap(struct __sk_buff *skb_ctx, return 0; } -__diag_pop() +__bpf_kfunc_end_defs(); BTF_SET8_START(fou_kfunc_set) BTF_ID_FLAGS(func, bpf_skb_set_fou_encap) diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c index b21799d468d2..475358ec8212 100644 --- a/net/netfilter/nf_conntrack_bpf.c +++ b/net/netfilter/nf_conntrack_bpf.c @@ -230,9 +230,7 @@ static int _nf_conntrack_btf_struct_access(struct bpf_verifier_log *log, return 0; } -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "Global functions as their definitions will be in nf_conntrack BTF"); +__bpf_kfunc_start_defs(); /* bpf_xdp_ct_alloc - Allocate a new CT entry * @@ -467,7 +465,7 @@ __bpf_kfunc int bpf_ct_change_status(struct nf_conn *nfct, u32 status) return nf_ct_change_status_common(nfct, status); } -__diag_pop() +__bpf_kfunc_end_defs(); BTF_SET8_START(nf_ct_kfunc_set) BTF_ID_FLAGS(func, bpf_xdp_ct_alloc, KF_ACQUIRE | KF_RET_NULL) diff --git a/net/netfilter/nf_nat_bpf.c b/net/netfilter/nf_nat_bpf.c index 141ee7783223..6e3b2f58855f 100644 --- a/net/netfilter/nf_nat_bpf.c +++ b/net/netfilter/nf_nat_bpf.c @@ -12,9 +12,7 @@ #include <net/netfilter/nf_conntrack_core.h> #include <net/netfilter/nf_nat.h> -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "Global functions as their definitions will be in nf_nat BTF"); +__bpf_kfunc_start_defs(); /* bpf_ct_set_nat_info - Set source or destination nat address * @@ -54,7 +52,7 @@ __bpf_kfunc int bpf_ct_set_nat_info(struct nf_conn___init *nfct, return nf_nat_setup_info(ct, &range, manip) == NF_DROP ? -ENOMEM : 0; } -__diag_pop() +__bpf_kfunc_end_defs(); BTF_SET8_START(nf_nat_kfunc_set) BTF_ID_FLAGS(func, bpf_ct_set_nat_info, KF_TRUSTED_ARGS) diff --git a/net/xfrm/xfrm_interface_bpf.c b/net/xfrm/xfrm_interface_bpf.c index d74f3fd20f2b..7d5e920141e9 100644 --- a/net/xfrm/xfrm_interface_bpf.c +++ b/net/xfrm/xfrm_interface_bpf.c @@ -27,9 +27,7 @@ struct bpf_xfrm_info { int link; }; -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "Global functions as their definitions will be in xfrm_interface BTF"); +__bpf_kfunc_start_defs(); /* bpf_skb_get_xfrm_info - Get XFRM metadata * @@ -93,7 +91,7 @@ __bpf_kfunc int bpf_skb_set_xfrm_info(struct __sk_buff *skb_ctx, const struct bp return 0; } -__diag_pop() +__bpf_kfunc_end_defs(); BTF_SET8_START(xfrm_ifc_kfunc_set) BTF_ID_FLAGS(func, bpf_skb_get_xfrm_info) -- 2.34.1
From: Dave Marchevsky <davemarchevsky@fb.com> mainline inclusion from mainline-v6.7-rc1 commit 15fb6f2b6c4c3c129adc2412ae12ec15e60a6adb category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Not all uses of __diag_ignore_all(...) in BPF-related code in order to suppress warnings are wrapping kfunc definitions. Some "hook point" definitions - small functions meant to be used as attach points for fentry and similar BPF progs - need to suppress -Wmissing-declarations. We could use __bpf_kfunc_{start,end}_defs added in the previous patch in such cases, but this might be confusing to someone unfamiliar with BPF internals. Instead, this patch adds __bpf_hook_{start,end} macros, currently having the same effect as __bpf_kfunc_{start,end}_defs, then uses them to suppress warnings for two hook points in the kernel itself and some bpf_testmod hook points as well. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Cc: Yafang Shao <laoar.shao@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20231031215625.2343848-2-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/btf.h | 2 ++ kernel/cgroup/rstat.c | 9 +++------ net/socket.c | 8 ++------ tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c | 6 ++---- 4 files changed, 9 insertions(+), 16 deletions(-) diff --git a/include/linux/btf.h b/include/linux/btf.h index dc5ce962f600..59d404e22814 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -92,6 +92,8 @@ "Global kfuncs as their definitions will be in BTF") #define __bpf_kfunc_end_defs() __diag_pop() +#define __bpf_hook_start() __bpf_kfunc_start_defs() +#define __bpf_hook_end() __bpf_kfunc_end_defs() /* * Return the name of the passed struct, if exists, or halt the build if for diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 329c10c084a2..c6d6bec9bcce 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -156,19 +156,16 @@ static struct cgroup *cgroup_rstat_cpu_pop_updated(struct cgroup *pos, * optimize away the callsite. Therefore, __weak is needed to ensure that the * call is still emitted, by telling the compiler that we don't know what the * function might eventually be. - * - * __diag_* below are needed to dismiss the missing prototype warning. */ -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "kfuncs which will be used in BPF programs"); + +__bpf_hook_start(); __weak noinline void bpf_rstat_flush(struct cgroup *cgrp, struct cgroup *parent, int cpu) { } -__diag_pop(); +__bpf_hook_end(); /* see cgroup_rstat_flush() */ static void cgroup_rstat_flush_locked(struct cgroup *cgrp) diff --git a/net/socket.c b/net/socket.c index 23c08b0d2899..15b696635e43 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1696,20 +1696,16 @@ struct file *__sys_socket_file(int family, int type, int protocol) * Therefore, __weak is needed to ensure that the call is still * emitted, by telling the compiler that we don't know what the * function might eventually be. - * - * __diag_* below are needed to dismiss the missing prototype warning. */ -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "A fmod_ret entry point for BPF programs"); +__bpf_hook_start(); __weak noinline int update_socket_protocol(int family, int type, int protocol) { return protocol; } -__diag_pop(); +__bpf_hook_end(); int __sys_socket(int family, int type, int protocol) { diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index 2e8adf059fa3..1b8589143bd3 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -40,9 +40,7 @@ struct bpf_testmod_struct_arg_4 { int b; }; -__diag_push(); -__diag_ignore_all("-Wmissing-prototypes", - "Global functions as their definitions will be in bpf_testmod.ko BTF"); +__bpf_hook_start(); noinline int bpf_testmod_test_struct_arg_1(struct bpf_testmod_struct_arg_2 a, int b, int c) { @@ -332,7 +330,7 @@ noinline int bpf_fentry_shadow_test(int a) } EXPORT_SYMBOL_GPL(bpf_fentry_shadow_test); -__diag_pop(); +__bpf_hook_end(); static struct bin_attribute bin_attr_bpf_testmod_file __ro_after_init = { .attr = { .name = "bpf_testmod", .mode = 0666, }, -- 2.34.1
From: Chuyi Zhou <zhouchuyi@bytedance.com> mainline inclusion from mainline-v6.7-rc1 commit 3091b667498b0a212e760e1033e5f9b8c33a948f category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- The newly added open-coded css_task iter would try to hold the global css_set_lock in bpf_iter_css_task_new, so the bpf side has to be careful in where it allows to use this iter. The mainly concern is dead locking on css_set_lock. check_css_task_iter_allowlist() in verifier enforced css_task can only be used in bpf_lsm hooks and sleepable bpf_iter. This patch relax the allowlist for css_task iter. Any lsm and any iter (even non-sleepable) and any sleepable are safe since they would not hold the css_set_lock before entering BPF progs context. This patch also fixes the misused BPF_TRACE_ITER in check_css_task_iter_allowlist which compared bpf_prog_type with bpf_attach_type. Fixes: 9c66dc94b62ae ("bpf: Introduce css_task open-coded iterator kfuncs") Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20231031050438.93297-2-zhouchuyi@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/verifier.c | 16 ++++++++++++---- .../selftests/bpf/progs/iters_task_failure.c | 4 ++-- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index fb230e9a5459..4683a6184a64 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -11451,6 +11451,12 @@ static int process_kf_arg_ptr_to_rbtree_node(struct bpf_verifier_env *env, &meta->arg_rbtree_root.field); } +/* + * css_task iter allowlist is needed to avoid dead locking on css_set_lock. + * LSM hooks and iters (both sleepable and non-sleepable) are safe. + * Any sleepable progs are also safe since bpf_check_attach_target() enforce + * them can only be attached to some specific hook points. + */ static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env) { enum bpf_prog_type prog_type = resolve_prog_type(env->prog); @@ -11458,10 +11464,12 @@ static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env) switch (prog_type) { case BPF_PROG_TYPE_LSM: return true; - case BPF_TRACE_ITER: - return env->prog->aux->sleepable; + case BPF_PROG_TYPE_TRACING: + if (env->prog->expected_attach_type == BPF_TRACE_ITER) + return true; + fallthrough; default: - return false; + return env->prog->aux->sleepable; } } @@ -11711,7 +11719,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ case KF_ARG_PTR_TO_ITER: if (meta->func_id == special_kfunc_list[KF_bpf_iter_css_task_new]) { if (!check_css_task_iter_allowlist(env)) { - verbose(env, "css_task_iter is only allowed in bpf_lsm and bpf iter-s\n"); + verbose(env, "css_task_iter is only allowed in bpf_lsm, bpf_iter and sleepable progs\n"); return -EINVAL; } } diff --git a/tools/testing/selftests/bpf/progs/iters_task_failure.c b/tools/testing/selftests/bpf/progs/iters_task_failure.c index c3bf96a67dba..6b1588d70652 100644 --- a/tools/testing/selftests/bpf/progs/iters_task_failure.c +++ b/tools/testing/selftests/bpf/progs/iters_task_failure.c @@ -84,8 +84,8 @@ int BPF_PROG(iter_css_lock_and_unlock) return 0; } -SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") -__failure __msg("css_task_iter is only allowed in bpf_lsm and bpf iter-s") +SEC("?fentry/" SYS_PREFIX "sys_getpgid") +__failure __msg("css_task_iter is only allowed in bpf_lsm, bpf_iter and sleepable progs") int BPF_PROG(iter_css_task_for_each) { u64 cg_id = bpf_get_current_cgroup_id(); -- 2.34.1
From: Chuyi Zhou <zhouchuyi@bytedance.com> mainline inclusion from mainline-v6.7-rc1 commit f49843afde6771ef6ed5d021eacafacfc98a58bf category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- This patch adds a test which demonstrates how css_task iter can be combined with cgroup iter and it won't cause deadlock, though cgroup iter is not sleepable. Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20231031050438.93297-3-zhouchuyi@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/prog_tests/cgroup_iter.c | 33 ++++++++++++++ .../selftests/bpf/progs/iters_css_task.c | 44 +++++++++++++++++++ 2 files changed, 77 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/cgroup_iter.c b/tools/testing/selftests/bpf/prog_tests/cgroup_iter.c index e02feb5fae97..574d9a0cdc8e 100644 --- a/tools/testing/selftests/bpf/prog_tests/cgroup_iter.c +++ b/tools/testing/selftests/bpf/prog_tests/cgroup_iter.c @@ -4,6 +4,7 @@ #include <test_progs.h> #include <bpf/libbpf.h> #include <bpf/btf.h> +#include "iters_css_task.skel.h" #include "cgroup_iter.skel.h" #include "cgroup_helpers.h" @@ -263,6 +264,35 @@ static void test_walk_dead_self_only(struct cgroup_iter *skel) close(cgrp_fd); } +static void test_walk_self_only_css_task(void) +{ + struct iters_css_task *skel; + int err; + + skel = iters_css_task__open(); + if (!ASSERT_OK_PTR(skel, "skel_open")) + return; + + bpf_program__set_autoload(skel->progs.cgroup_id_printer, true); + + err = iters_css_task__load(skel); + if (!ASSERT_OK(err, "skel_load")) + goto cleanup; + + err = join_cgroup(cg_path[CHILD2]); + if (!ASSERT_OK(err, "join_cgroup")) + goto cleanup; + + skel->bss->target_pid = getpid(); + snprintf(expected_output, sizeof(expected_output), + PROLOGUE "%8llu\n" EPILOGUE, cg_id[CHILD2]); + read_from_cgroup_iter(skel->progs.cgroup_id_printer, cg_fd[CHILD2], + BPF_CGROUP_ITER_SELF_ONLY, "test_walk_self_only_css_task"); + ASSERT_EQ(skel->bss->css_task_cnt, 1, "css_task_cnt"); +cleanup: + iters_css_task__destroy(skel); +} + void test_cgroup_iter(void) { struct cgroup_iter *skel = NULL; @@ -293,6 +323,9 @@ void test_cgroup_iter(void) test_walk_self_only(skel); if (test__start_subtest("cgroup_iter__dead_self_only")) test_walk_dead_self_only(skel); + if (test__start_subtest("cgroup_iter__self_only_css_task")) + test_walk_self_only_css_task(); + out: cgroup_iter__destroy(skel); cleanup_cgroups(); diff --git a/tools/testing/selftests/bpf/progs/iters_css_task.c b/tools/testing/selftests/bpf/progs/iters_css_task.c index 5089ce384a1c..384ff806990f 100644 --- a/tools/testing/selftests/bpf/progs/iters_css_task.c +++ b/tools/testing/selftests/bpf/progs/iters_css_task.c @@ -10,6 +10,7 @@ char _license[] SEC("license") = "GPL"; +struct cgroup *bpf_cgroup_acquire(struct cgroup *p) __ksym; struct cgroup *bpf_cgroup_from_id(u64 cgid) __ksym; void bpf_cgroup_release(struct cgroup *p) __ksym; @@ -45,3 +46,46 @@ int BPF_PROG(iter_css_task_for_each, struct vm_area_struct *vma, return -EPERM; } + +static inline u64 cgroup_id(struct cgroup *cgrp) +{ + return cgrp->kn->id; +} + +SEC("?iter/cgroup") +int cgroup_id_printer(struct bpf_iter__cgroup *ctx) +{ + struct seq_file *seq = ctx->meta->seq; + struct cgroup *cgrp, *acquired; + struct cgroup_subsys_state *css; + struct task_struct *task; + u64 cgrp_id; + + cgrp = ctx->cgroup; + + /* epilogue */ + if (cgrp == NULL) { + BPF_SEQ_PRINTF(seq, "epilogue\n"); + return 0; + } + + /* prologue */ + if (ctx->meta->seq_num == 0) + BPF_SEQ_PRINTF(seq, "prologue\n"); + + cgrp_id = cgroup_id(cgrp); + + BPF_SEQ_PRINTF(seq, "%8llu\n", cgrp_id); + + acquired = bpf_cgroup_from_id(cgrp_id); + if (!acquired) + return 0; + css = &acquired->self; + css_task_cnt = 0; + bpf_for_each(css_task, task, css, CSS_TASK_ITER_PROCS) { + if (task->pid == target_pid) + css_task_cnt++; + } + bpf_cgroup_release(acquired); + return 0; +} -- 2.34.1
From: Chuyi Zhou <zhouchuyi@bytedance.com> mainline inclusion from mainline-v6.7-rc1 commit d8234d47c4aa494d789b85562fa90e837b4575f9 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- This Patch add a test to prove css_task iter can be used in normal sleepable progs. Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20231031050438.93297-4-zhouchuyi@bytedance.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../testing/selftests/bpf/prog_tests/iters.c | 1 + .../selftests/bpf/progs/iters_css_task.c | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c index c2425791c923..bf84d4a1d9ae 100644 --- a/tools/testing/selftests/bpf/prog_tests/iters.c +++ b/tools/testing/selftests/bpf/prog_tests/iters.c @@ -294,6 +294,7 @@ void test_iters(void) RUN_TESTS(iters_state_safety); RUN_TESTS(iters_looping); RUN_TESTS(iters); + RUN_TESTS(iters_css_task); if (env.has_testmod) RUN_TESTS(iters_testmod_seq); diff --git a/tools/testing/selftests/bpf/progs/iters_css_task.c b/tools/testing/selftests/bpf/progs/iters_css_task.c index 384ff806990f..e180aa1b1d52 100644 --- a/tools/testing/selftests/bpf/progs/iters_css_task.c +++ b/tools/testing/selftests/bpf/progs/iters_css_task.c @@ -89,3 +89,22 @@ int cgroup_id_printer(struct bpf_iter__cgroup *ctx) bpf_cgroup_release(acquired); return 0; } + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +int BPF_PROG(iter_css_task_for_each_sleep) +{ + u64 cgrp_id = bpf_get_current_cgroup_id(); + struct cgroup *cgrp = bpf_cgroup_from_id(cgrp_id); + struct cgroup_subsys_state *css; + struct task_struct *task; + + if (cgrp == NULL) + return 0; + css = &cgrp->self; + + bpf_for_each(css_task, task, css, CSS_TASK_ITER_PROCS) { + + } + bpf_cgroup_release(cgrp); + return 0; +} -- 2.34.1
From: Chuyi Zhou <zhouchuyi@bytedance.com> mainline inclusion from mainline-v6.7-rc1 commit 0de4f50de25af79c2a46db55d70cdbd8f985c6d1 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- BTF_TYPE_SAFE_TRUSTED(struct bpf_iter__task) in verifier.c wanted to teach BPF verifier that bpf_iter__task -> task is a trusted ptr. But it doesn't work well. The reason is, bpf_iter__task -> task would go through btf_ctx_access() which enforces the reg_type of 'task' is ctx_arg_info->reg_type, and in task_iter.c, we actually explicitly declare that the ctx_arg_info->reg_type is PTR_TO_BTF_ID_OR_NULL. Actually we have a previous case like this[1] where PTR_TRUSTED is added to the arg flag for map_iter. This patch sets ctx_arg_info->reg_type is PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED in task_reg_info. Similarly, bpf_cgroup_reg_info -> cgroup is also PTR_TRUSTED since we are under the protection of cgroup_mutex and we would check cgroup_is_dead() in __cgroup_iter_seq_show(). This patch is to improve the user experience of the newly introduced bpf_iter_css_task kfunc before hitting the mainline. The Fixes tag is pointing to the commit introduced the bpf_iter_css_task kfunc. Link[1]:https://lore.kernel.org/all/20230706133932.45883-3-aspsk@isovalent.com/ Fixes: 9c66dc94b62a ("bpf: Introduce css_task open-coded iterator kfuncs") Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20231107132204.912120-2-zhouchuyi@bytedance.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/cgroup_iter.c | 2 +- kernel/bpf/task_iter.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c index d1b5c5618dd7..f04a468cf6a7 100644 --- a/kernel/bpf/cgroup_iter.c +++ b/kernel/bpf/cgroup_iter.c @@ -282,7 +282,7 @@ static struct bpf_iter_reg bpf_cgroup_reg_info = { .ctx_arg_info_size = 1, .ctx_arg_info = { { offsetof(struct bpf_iter__cgroup, cgroup), - PTR_TO_BTF_ID_OR_NULL }, + PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED }, }, .seq_info = &cgroup_iter_seq_info, }; diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c index ea8d2cf3f52c..145c506899e4 100644 --- a/kernel/bpf/task_iter.c +++ b/kernel/bpf/task_iter.c @@ -726,7 +726,7 @@ static struct bpf_iter_reg task_reg_info = { .ctx_arg_info_size = 1, .ctx_arg_info = { { offsetof(struct bpf_iter__task, task), - PTR_TO_BTF_ID_OR_NULL }, + PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED }, }, .seq_info = &task_seq_info, .fill_link_info = bpf_iter_fill_link_info, -- 2.34.1
From: Chuyi Zhou <zhouchuyi@bytedance.com> mainline inclusion from mainline-v6.7-rc1 commit 3c5864ba9cf912ff9809f315d28f296f21563cce category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Commit f49843afde (selftests/bpf: Add tests for css_task iter combining with cgroup iter) added a test which demonstrates how css_task iter can be combined with cgroup iter. That test used bpf_cgroup_from_id() to convert bpf_iter__cgroup->cgroup to a trusted ptr which is pointless now, since with the previous fix, we can get a trusted cgroup directly from bpf_iter__cgroup. Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20231107132204.912120-3-zhouchuyi@bytedance.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../testing/selftests/bpf/progs/iters_css_task.c | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/iters_css_task.c b/tools/testing/selftests/bpf/progs/iters_css_task.c index e180aa1b1d52..9ac758649cb8 100644 --- a/tools/testing/selftests/bpf/progs/iters_css_task.c +++ b/tools/testing/selftests/bpf/progs/iters_css_task.c @@ -56,12 +56,9 @@ SEC("?iter/cgroup") int cgroup_id_printer(struct bpf_iter__cgroup *ctx) { struct seq_file *seq = ctx->meta->seq; - struct cgroup *cgrp, *acquired; + struct cgroup *cgrp = ctx->cgroup; struct cgroup_subsys_state *css; struct task_struct *task; - u64 cgrp_id; - - cgrp = ctx->cgroup; /* epilogue */ if (cgrp == NULL) { @@ -73,20 +70,15 @@ int cgroup_id_printer(struct bpf_iter__cgroup *ctx) if (ctx->meta->seq_num == 0) BPF_SEQ_PRINTF(seq, "prologue\n"); - cgrp_id = cgroup_id(cgrp); - - BPF_SEQ_PRINTF(seq, "%8llu\n", cgrp_id); + BPF_SEQ_PRINTF(seq, "%8llu\n", cgroup_id(cgrp)); - acquired = bpf_cgroup_from_id(cgrp_id); - if (!acquired) - return 0; - css = &acquired->self; + css = &cgrp->self; css_task_cnt = 0; bpf_for_each(css_task, task, css, CSS_TASK_ITER_PROCS) { if (task->pid == target_pid) css_task_cnt++; } - bpf_cgroup_release(acquired); + return 0; } -- 2.34.1
From: Andrii Nakryiko <andrii@kernel.org> mainline inclusion from mainline-v6.8-rc1 commit 491dd8edecbc5027ee317f3f1e7e9800fb66d88f category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- We have the name, instead of emitting just func#N to identify global subprog, augment verifier log messages with actual function name to make it more user-friendly. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231124035937.403208-2-andrii@kernel.org Conflicts: kernel/bpf/verifier.c [Fix conflicts because of cve patches and missing previous patches.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/verifier.c | 35 ++++++++++++++++++++++++----------- 1 file changed, 24 insertions(+), 11 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 4683a6184a64..29b1cbbf60be 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -339,6 +339,11 @@ struct bpf_kfunc_call_arg_meta { struct btf *btf_vmlinux; +static const char *btf_type_name(const struct btf *btf, u32 id) +{ + return btf_name_by_offset(btf, btf_type_by_id(btf, id)->name_off); +} + static DEFINE_MUTEX(bpf_verifier_lock); static const struct bpf_line_info * @@ -503,6 +508,17 @@ static bool subprog_is_global(const struct bpf_verifier_env *env, int subprog) return aux && aux[subprog].linkage == BTF_FUNC_GLOBAL; } +static const char *subprog_name(const struct bpf_verifier_env *env, int subprog) +{ + struct bpf_func_info *info; + + if (!env->prog->aux->func_info) + return ""; + + info = &env->prog->aux->func_info[subprog]; + return btf_type_name(env->prog->aux->btf, info->type_id); +} + static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg) { return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK); @@ -748,11 +764,6 @@ static int iter_get_spi(struct bpf_verifier_env *env, struct bpf_reg_state *reg, return stack_slot_obj_get_spi(env, reg, "iter", nr_slots); } -static const char *btf_type_name(const struct btf *btf, u32 id) -{ - return btf_name_by_offset(btf, btf_type_by_id(btf, id)->name_off); -} - static const char *dynptr_type_str(enum bpf_dynptr_type type) { switch (type) { @@ -9472,13 +9483,16 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn, if (err == -EFAULT) return err; if (subprog_is_global(env, subprog)) { + const char *sub_name = subprog_name(env, subprog); + if (err) { - verbose(env, "Caller passes invalid args into func#%d\n", subprog); + verbose(env, "Caller passes invalid args into func#%d ('%s')\n", + subprog, sub_name); return err; } - if (env->log.level & BPF_LOG_LEVEL) - verbose(env, "Func#%d is global and valid. Skipping.\n", subprog); + verbose(env, "Func#%d ('%s') is global and assumed valid.\n", + subprog, sub_name); if (env->subprog_info[subprog].changes_pkt_data) clear_all_pkt_pointers(env); clear_caller_saved_regs(env, caller->regs); @@ -20117,9 +20131,8 @@ static int do_check_subprogs(struct bpf_verifier_env *env) if (ret) { return ret; } else if (env->log.level & BPF_LOG_LEVEL) { - verbose(env, - "Func#%d is safe for any args that match its prototype\n", - i); + verbose(env, "Func#%d ('%s') is safe for any args that match its prototype\n", + i, subprog_name(env, i)); } } return 0; -- 2.34.1
From: Andrii Nakryiko <andrii@kernel.org> mainline inclusion from mainline-v6.8-rc1 commit 2afae08c9dcb8ac648414277cec70c2fe6a34d9e category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Slightly change BPF verifier logic around eagerness and order of global subprog validation. Instead of going over every global subprog eagerly and validating it before main (entry) BPF program is verified, turn it around. Validate main program first, mark subprogs that were called from main program for later verification, but otherwise assume it is valid. Afterwards, go over marked global subprogs and validate those, potentially marking some more global functions as being called. Continue this process until all (transitively) callable global subprogs are validated. It's a BFS traversal at its heart and will always converge. This is an important change because it allows to feature-gate some subprograms that might not be verifiable on some older kernel, depending on supported set of features. E.g., at some point, global functions were allowed to accept a pointer to memory, which size is identified by user-provided type. Unfortunately, older kernels don't support this feature. With BPF CO-RE approach, the natural way would be to still compile BPF object file once and guard calls to this global subprog with some CO-RE check or using .rodata variables. That's what people do to guard usage of new helpers or kfuncs, and any other new BPF-side feature that might be missing on old kernels. That's currently impossible to do with global subprogs, unfortunately, because they are eagerly and unconditionally validated. This patch set aims to change this, so that in the future when global funcs gain new features, those can be guarded using BPF CO-RE techniques in the same fashion as any other new kernel feature. Two selftests had to be adjusted in sync with these changes. test_global_func12 relied on eager global subprog validation failing before main program failure is detected (unknown return value). Fix by making sure that main program is always valid. verifier_subprog_precision's parent_stack_slot_precise subtest relied on verifier checkpointing heuristic to do a checkpoint at instruction #5, but that's no longer true because we don't have enough jumps validated before reaching insn #5 due to global subprogs being validated later. Other than that, no changes, as one would expect. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231124035937.403208-3-andrii@kernel.org Conflicts: kernel/bpf/verifier.c [Fix conflicts because of missing pre-patches BPF exceptions and cve.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf.h | 2 + kernel/bpf/verifier.c | 43 ++++++++++++++++--- .../selftests/bpf/progs/test_global_func12.c | 4 +- .../bpf/progs/verifier_subprog_precision.c | 4 +- 4 files changed, 43 insertions(+), 10 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index d6498fbda50a..3c8853d9c255 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1434,6 +1434,8 @@ static inline bool bpf_prog_has_trampoline(const struct bpf_prog *prog) struct bpf_func_info_aux { u16 linkage; bool unreliable; + bool called : 1; + bool verified : 1; }; enum bpf_jit_poke_reason { diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 29b1cbbf60be..aee4419f4649 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -519,6 +519,11 @@ static const char *subprog_name(const struct bpf_verifier_env *env, int subprog) return btf_type_name(env->prog->aux->btf, info->type_id); } +static struct bpf_func_info_aux *subprog_aux(const struct bpf_verifier_env *env, int subprog) +{ + return &env->prog->aux->func_info_aux[subprog]; +} + static bool reg_may_point_to_spin_lock(const struct bpf_reg_state *reg) { return btf_record_has_field(reg_btf_record(reg), BPF_SPIN_LOCK); @@ -9495,6 +9500,8 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn, subprog, sub_name); if (env->subprog_info[subprog].changes_pkt_data) clear_all_pkt_pointers(env); + /* mark global subprog for verifying after main prog */ + subprog_aux(env, subprog)->called = true; clear_caller_saved_regs(env, caller->regs); /* All global functions return a 64-bit SCALAR_VALUE */ @@ -20097,8 +20104,11 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog) return ret; } -/* Verify all global functions in a BPF program one by one based on their BTF. - * All global functions must pass verification. Otherwise the whole program is rejected. +/* Lazily verify all global functions based on their BTF, if they are called + * from main BPF program or any of subprograms transitively. + * BPF global subprogs called from dead code are not validated. + * All callable global functions must pass verification. + * Otherwise the whole program is rejected. * Consider: * int bar(int); * int foo(int f) @@ -20117,13 +20127,20 @@ static int do_check_common(struct bpf_verifier_env *env, int subprog) static int do_check_subprogs(struct bpf_verifier_env *env) { struct bpf_prog_aux *aux = env->prog->aux; - int i, ret; + struct bpf_func_info_aux *sub_aux; + int i, ret, new_cnt; if (!aux->func_info) return 0; +again: + new_cnt = 0; for (i = 1; i < env->subprog_cnt; i++) { - if (aux->func_info_aux[i].linkage != BTF_FUNC_GLOBAL) + if (!subprog_is_global(env, i)) + continue; + + sub_aux = subprog_aux(env, i); + if (!sub_aux->called || sub_aux->verified) continue; env->insn_idx = env->subprog_info[i].start; WARN_ON_ONCE(env->insn_idx == 0); @@ -20134,7 +20151,21 @@ static int do_check_subprogs(struct bpf_verifier_env *env) verbose(env, "Func#%d ('%s') is safe for any args that match its prototype\n", i, subprog_name(env, i)); } + + /* We verified new global subprog, it might have called some + * more global subprogs that we haven't verified yet, so we + * need to do another pass over subprogs to verify those. + */ + sub_aux->verified = true; + new_cnt++; } + + /* We can't loop forever as we verify at least one global subprog on + * each pass. + */ + if (new_cnt) + goto again; + return 0; } @@ -20833,8 +20864,8 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3 if (ret) goto skip_full_check; - ret = do_check_subprogs(env); - ret = ret ?: do_check_main(env); + ret = do_check_main(env); + ret = ret ?: do_check_subprogs(env); if (ret == 0 && bpf_prog_is_offloaded(env->prog->aux)) ret = bpf_prog_offload_finalize(env); diff --git a/tools/testing/selftests/bpf/progs/test_global_func12.c b/tools/testing/selftests/bpf/progs/test_global_func12.c index 7f159d83c6f6..6e03d42519a6 100644 --- a/tools/testing/selftests/bpf/progs/test_global_func12.c +++ b/tools/testing/selftests/bpf/progs/test_global_func12.c @@ -19,5 +19,7 @@ int global_func12(struct __sk_buff *skb) { const struct S s = {.x = skb->len }; - return foo(&s); + foo(&s); + + return 1; } diff --git a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c index 4b8b0f45d17d..83d187de7f6b 100644 --- a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c +++ b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c @@ -370,12 +370,10 @@ __naked int parent_stack_slot_precise(void) SEC("?raw_tp") __success __log_level(2) __msg("9: (0f) r1 += r6") -__msg("mark_precise: frame0: last_idx 9 first_idx 6") +__msg("mark_precise: frame0: last_idx 9 first_idx 0") __msg("mark_precise: frame0: regs=r6 stack= before 8: (bf) r1 = r7") __msg("mark_precise: frame0: regs=r6 stack= before 7: (27) r6 *= 4") __msg("mark_precise: frame0: regs=r6 stack= before 6: (79) r6 = *(u64 *)(r10 -8)") -__msg("mark_precise: frame0: parent state regs= stack=-8:") -__msg("mark_precise: frame0: last_idx 5 first_idx 0") __msg("mark_precise: frame0: regs= stack=-8 before 5: (85) call pc+6") __msg("mark_precise: frame0: regs= stack=-8 before 4: (b7) r1 = 0") __msg("mark_precise: frame0: regs= stack=-8 before 3: (7b) *(u64 *)(r10 -8) = r6") -- 2.34.1
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA -------------------------------- Fix kabi-breakage by using KABI_FILL_HOLE because there is some hole in bpf_func_info_aux. Fixes: a88fed0a5443 ("bpf: Validate global subprogs lazily") Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 3c8853d9c255..511168c6dc2f 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1434,8 +1434,8 @@ static inline bool bpf_prog_has_trampoline(const struct bpf_prog *prog) struct bpf_func_info_aux { u16 linkage; bool unreliable; - bool called : 1; - bool verified : 1; + KABI_FILL_HOLE(bool called : 1) + KABI_FILL_HOLE(bool verified : 1) }; enum bpf_jit_poke_reason { -- 2.34.1
From: Andrii Nakryiko <andrii@kernel.org> mainline inclusion from mainline-v6.8-rc1 commit e8a339b5235e294f29153149ea7cf26a9a87dbea category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Add a few test that validate BPF verifier's lazy approach to validating global subprogs. We check that global subprogs that are called transitively through another global subprog is validated. We also check that invalid global subprog is not validated, if it's not called from the main program. And we also check that main program is always validated first, before any of the subprogs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231124035937.403208-4-andrii@kernel.org Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/prog_tests/verifier.c | 2 + .../bpf/progs/verifier_global_subprogs.c | 92 +++++++++++++++++++ 2 files changed, 94 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/verifier_global_subprogs.c diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c index 5cfa7a6316b6..8d746642cbd7 100644 --- a/tools/testing/selftests/bpf/prog_tests/verifier.c +++ b/tools/testing/selftests/bpf/prog_tests/verifier.c @@ -25,6 +25,7 @@ #include "verifier_direct_stack_access_wraparound.skel.h" #include "verifier_div0.skel.h" #include "verifier_div_overflow.skel.h" +#include "verifier_global_subprogs.skel.h" #include "verifier_gotol.skel.h" #include "verifier_helper_access_var_len.skel.h" #include "verifier_helper_packet_access.skel.h" @@ -134,6 +135,7 @@ void test_verifier_direct_packet_access(void) { RUN(verifier_direct_packet_acces void test_verifier_direct_stack_access_wraparound(void) { RUN(verifier_direct_stack_access_wraparound); } void test_verifier_div0(void) { RUN(verifier_div0); } void test_verifier_div_overflow(void) { RUN(verifier_div_overflow); } +void test_verifier_global_subprogs(void) { RUN(verifier_global_subprogs); } void test_verifier_gotol(void) { RUN(verifier_gotol); } void test_verifier_helper_access_var_len(void) { RUN(verifier_helper_access_var_len); } void test_verifier_helper_packet_access(void) { RUN(verifier_helper_packet_access); } diff --git a/tools/testing/selftests/bpf/progs/verifier_global_subprogs.c b/tools/testing/selftests/bpf/progs/verifier_global_subprogs.c new file mode 100644 index 000000000000..a0a5efd1caa1 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/verifier_global_subprogs.c @@ -0,0 +1,92 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */ + +#include <stdbool.h> +#include <errno.h> +#include <string.h> +#include <linux/bpf.h> +#include <bpf/bpf_helpers.h> +#include "bpf_misc.h" + +int arr[1]; +int unkn_idx; + +__noinline long global_bad(void) +{ + return arr[unkn_idx]; /* BOOM */ +} + +__noinline long global_good(void) +{ + return arr[0]; +} + +__noinline long global_calls_bad(void) +{ + return global_good() + global_bad() /* does BOOM indirectly */; +} + +__noinline long global_calls_good_only(void) +{ + return global_good(); +} + +SEC("?raw_tp") +__success __log_level(2) +/* main prog is validated completely first */ +__msg("('global_calls_good_only') is global and assumed valid.") +__msg("1: (95) exit") +/* eventually global_good() is transitively validated as well */ +__msg("Validating global_good() func") +__msg("('global_good') is safe for any args that match its prototype") +int chained_global_func_calls_success(void) +{ + return global_calls_good_only(); +} + +SEC("?raw_tp") +__failure __log_level(2) +/* main prog validated successfully first */ +__msg("1: (95) exit") +/* eventually we validate global_bad() and fail */ +__msg("Validating global_bad() func") +__msg("math between map_value pointer and register") /* BOOM */ +int chained_global_func_calls_bad(void) +{ + return global_calls_bad(); +} + +/* do out of bounds access forcing verifier to fail verification if this + * global func is called + */ +__noinline int global_unsupp(const int *mem) +{ + if (!mem) + return 0; + return mem[100]; /* BOOM */ +} + +const volatile bool skip_unsupp_global = true; + +SEC("?raw_tp") +__success +int guarded_unsupp_global_called(void) +{ + if (!skip_unsupp_global) + return global_unsupp(NULL); + return 0; +} + +SEC("?raw_tp") +__failure __log_level(2) +__msg("Func#1 ('global_unsupp') is global and assumed valid.") +__msg("Validating global_unsupp() func#1...") +__msg("value is outside of the allowed memory range") +int unguarded_unsupp_global_called(void) +{ + int x = 0; + + return global_unsupp(&x); +} + +char _license[] SEC("license") = "GPL"; -- 2.34.1
From: Song Liu <song@kernel.org> mainline inclusion from mainline-v6.8-rc1 commit f08a1c658257c73697a819c4ded3a84b6f0ead74 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Currently, bpf_prog_pack_free only can only free pointer to struct bpf_binary_header, which is not flexible. Add a size argument to bpf_prog_pack_free so that it can handle any pointer. Signed-off-by: Song Liu <song@kernel.org> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # on s390x Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20231206224054.492250-2-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/filter.h | 2 +- kernel/bpf/core.c | 21 ++++++++++----------- kernel/bpf/dispatcher.c | 5 +---- 3 files changed, 12 insertions(+), 16 deletions(-) diff --git a/include/linux/filter.h b/include/linux/filter.h index e55b625257bd..8908c5b1c951 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1057,7 +1057,7 @@ struct bpf_binary_header * bpf_jit_binary_pack_hdr(const struct bpf_prog *fp); void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns); -void bpf_prog_pack_free(struct bpf_binary_header *hdr); +void bpf_prog_pack_free(void *ptr, u32 size); static inline bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp) { diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 72a7503a59d0..d0436f78c4b7 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -942,20 +942,20 @@ void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns) return ptr; } -void bpf_prog_pack_free(struct bpf_binary_header *hdr) +void bpf_prog_pack_free(void *ptr, u32 size) { struct bpf_prog_pack *pack = NULL, *tmp; unsigned int nbits; unsigned long pos; mutex_lock(&pack_mutex); - if (hdr->size > BPF_PROG_PACK_SIZE) { - bpf_jit_free_exec(hdr); + if (size > BPF_PROG_PACK_SIZE) { + bpf_jit_free_exec(ptr); goto out; } list_for_each_entry(tmp, &pack_list, list) { - if ((void *)hdr >= tmp->ptr && (tmp->ptr + BPF_PROG_PACK_SIZE) > (void *)hdr) { + if (ptr >= tmp->ptr && (tmp->ptr + BPF_PROG_PACK_SIZE) > ptr) { pack = tmp; break; } @@ -964,10 +964,10 @@ void bpf_prog_pack_free(struct bpf_binary_header *hdr) if (WARN_ONCE(!pack, "bpf_prog_pack bug\n")) goto out; - nbits = BPF_PROG_SIZE_TO_NBITS(hdr->size); - pos = ((unsigned long)hdr - (unsigned long)pack->ptr) >> BPF_PROG_CHUNK_SHIFT; + nbits = BPF_PROG_SIZE_TO_NBITS(size); + pos = ((unsigned long)ptr - (unsigned long)pack->ptr) >> BPF_PROG_CHUNK_SHIFT; - WARN_ONCE(bpf_arch_text_invalidate(hdr, hdr->size), + WARN_ONCE(bpf_arch_text_invalidate(ptr, size), "bpf_prog_pack bug: missing bpf_arch_text_invalidate?\n"); bitmap_clear(pack->bitmap, pos, nbits); @@ -1114,8 +1114,7 @@ bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **image_ptr, *rw_header = kvmalloc(size, GFP_KERNEL); if (!*rw_header) { - bpf_arch_text_copy(&ro_header->size, &size, sizeof(size)); - bpf_prog_pack_free(ro_header); + bpf_prog_pack_free(ro_header, size); bpf_jit_uncharge_modmem(size); return NULL; } @@ -1146,7 +1145,7 @@ int bpf_jit_binary_pack_finalize(struct bpf_prog *prog, kvfree(rw_header); if (IS_ERR(ptr)) { - bpf_prog_pack_free(ro_header); + bpf_prog_pack_free(ro_header, ro_header->size); return PTR_ERR(ptr); } return 0; @@ -1167,7 +1166,7 @@ void bpf_jit_binary_pack_free(struct bpf_binary_header *ro_header, { u32 size = ro_header->size; - bpf_prog_pack_free(ro_header); + bpf_prog_pack_free(ro_header, size); kvfree(rw_header); bpf_jit_uncharge_modmem(size); } diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c index fa3e9225aedc..56760fc10e78 100644 --- a/kernel/bpf/dispatcher.c +++ b/kernel/bpf/dispatcher.c @@ -150,10 +150,7 @@ void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from, goto out; d->rw_image = bpf_jit_alloc_exec(PAGE_SIZE); if (!d->rw_image) { - u32 size = PAGE_SIZE; - - bpf_arch_text_copy(d->image, &size, sizeof(size)); - bpf_prog_pack_free((struct bpf_binary_header *)d->image); + bpf_prog_pack_free(d->image, PAGE_SIZE); d->image = NULL; goto out; } -- 2.34.1
From: Song Liu <song@kernel.org> mainline inclusion from mainline-v6.8-rc1 commit 7a3d9a159b178e87306a6e989071ed9a114a1a31 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- We are using "im" for "struct bpf_tramp_image" and "tr" for "struct bpf_trampoline" in most of the code base. The only exception is the prototype and fallback version of arch_prepare_bpf_trampoline(). Update them to match the rest of the code base. We mix "orig_call" and "func_addr" for the argument in different versions of arch_prepare_bpf_trampoline(). s/orig_call/func_addr/g so they match. Signed-off-by: Song Liu <song@kernel.org> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # on s390x Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20231206224054.492250-3-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- arch/arm64/net/bpf_jit_comp.c | 10 +++++----- include/linux/bpf.h | 4 ++-- kernel/bpf/trampoline.c | 4 ++-- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 2494a5c96489..50d97e94fe63 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -1890,7 +1890,7 @@ static bool is_struct_ops_tramp(const struct bpf_tramp_links *fentry_links) * */ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, - struct bpf_tramp_links *tlinks, void *orig_call, + struct bpf_tramp_links *tlinks, void *func_addr, int nregs, u32 flags) { int i; @@ -1992,7 +1992,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, if (flags & BPF_TRAMP_F_IP_ARG) { /* save ip address of the traced function */ - emit_addr_mov_i64(A64_R(10), (const u64)orig_call, ctx); + emit_addr_mov_i64(A64_R(10), (const u64)func_addr, ctx); emit(A64_STR64I(A64_R(10), A64_SP, ip_off), ctx); } @@ -2108,7 +2108,7 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end, const struct btf_func_model *m, u32 flags, struct bpf_tramp_links *tlinks, - void *orig_call) + void *func_addr) { int i, ret; int nregs = m->nr_args; @@ -2129,7 +2129,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, if (nregs > 8) return -ENOTSUPP; - ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nregs, flags); + ret = prepare_trampoline(&ctx, im, tlinks, func_addr, nregs, flags); if (ret < 0) return ret; @@ -2140,7 +2140,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, ctx.idx = 0; jit_fill_hole(image, (unsigned int)(image_end - image)); - ret = prepare_trampoline(&ctx, im, tlinks, orig_call, nregs, flags); + ret = prepare_trampoline(&ctx, im, tlinks, func_addr, nregs, flags); if (ret > 0 && validate_code(&ctx) < 0) ret = -EINVAL; diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 511168c6dc2f..5642ba7d6150 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1151,10 +1151,10 @@ struct bpf_tramp_run_ctx; * fexit = a set of program to run after original function */ struct bpf_tramp_image; -int arch_prepare_bpf_trampoline(struct bpf_tramp_image *tr, void *image, void *image_end, +int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end, const struct btf_func_model *m, u32 flags, struct bpf_tramp_links *tlinks, - void *orig_call); + void *func_addr); u64 notrace __bpf_prog_enter_sleepable_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx); void notrace __bpf_prog_exit_sleepable_recur(struct bpf_prog *prog, u64 start, diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index bb842decab0d..99c7221a8933 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -1060,10 +1060,10 @@ bpf_trampoline_exit_t bpf_trampoline_exit(const struct bpf_prog *prog) } int __weak -arch_prepare_bpf_trampoline(struct bpf_tramp_image *tr, void *image, void *image_end, +arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end, const struct btf_func_model *m, u32 flags, struct bpf_tramp_links *tlinks, - void *orig_call) + void *func_addr) { return -ENOTSUPP; } -- 2.34.1
From: Song Liu <song@kernel.org> mainline inclusion from mainline-v6.8-rc1 commit 82583daa2efc2e336962b231a46bad03a280b3e0 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- As BPF trampoline of different archs moves from bpf_jit_[alloc|free]_exec() to bpf_prog_pack_[alloc|free](), we need to use different _alloc, _free for different archs during the transition. Add the following helpers for this transition: void *arch_alloc_bpf_trampoline(unsigned int size); void arch_free_bpf_trampoline(void *image, unsigned int size); void arch_protect_bpf_trampoline(void *image, unsigned int size); void arch_unprotect_bpf_trampoline(void *image, unsigned int size); The fallback version of these helpers require size <= PAGE_SIZE, but they are only called with size == PAGE_SIZE. They will be called with size < PAGE_SIZE when arch_bpf_trampoline_size() helper is introduced later. Signed-off-by: Song Liu <song@kernel.org> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # on s390x Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20231206224054.492250-4-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf.h | 5 ++++ kernel/bpf/bpf_struct_ops.c | 12 ++++----- kernel/bpf/trampoline.c | 46 ++++++++++++++++++++++++++++------ net/bpf/bpf_dummy_struct_ops.c | 7 +++--- 4 files changed, 52 insertions(+), 18 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 5642ba7d6150..689dd882c791 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1155,6 +1155,11 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i const struct btf_func_model *m, u32 flags, struct bpf_tramp_links *tlinks, void *func_addr); +void *arch_alloc_bpf_trampoline(unsigned int size); +void arch_free_bpf_trampoline(void *image, unsigned int size); +void arch_protect_bpf_trampoline(void *image, unsigned int size); +void arch_unprotect_bpf_trampoline(void *image, unsigned int size); + u64 notrace __bpf_prog_enter_sleepable_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx); void notrace __bpf_prog_exit_sleepable_recur(struct bpf_prog *prog, u64 start, diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index db6176fb64dc..e9e95879bce2 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -515,7 +515,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, if (err) goto reset_unlock; } - set_memory_rox((long)st_map->image, 1); + arch_protect_bpf_trampoline(st_map->image, PAGE_SIZE); /* Let bpf_link handle registration & unregistration. * * Pair with smp_load_acquire() during lookup_elem(). @@ -524,7 +524,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, goto unlock; } - set_memory_rox((long)st_map->image, 1); + arch_protect_bpf_trampoline(st_map->image, PAGE_SIZE); err = st_ops->reg(kdata); if (likely(!err)) { /* This refcnt increment on the map here after @@ -547,8 +547,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, * there was a race in registering the struct_ops (under the same name) to * a sub-system through different struct_ops's maps. */ - set_memory_nx((long)st_map->image, 1); - set_memory_rw((long)st_map->image, 1); + arch_unprotect_bpf_trampoline(st_map->image, PAGE_SIZE); reset_unlock: bpf_struct_ops_map_put_progs(st_map); @@ -616,7 +615,7 @@ static void __bpf_struct_ops_map_free(struct bpf_map *map) bpf_struct_ops_map_put_progs(st_map); bpf_map_area_free(st_map->links); if (st_map->image) { - bpf_jit_free_exec(st_map->image); + arch_free_bpf_trampoline(st_map->image, PAGE_SIZE); bpf_jit_uncharge_modmem(PAGE_SIZE); } bpf_map_area_free(st_map->uvalue); @@ -691,7 +690,7 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) return ERR_PTR(ret); } - st_map->image = bpf_jit_alloc_exec(PAGE_SIZE); + st_map->image = arch_alloc_bpf_trampoline(PAGE_SIZE); if (!st_map->image) { /* __bpf_struct_ops_map_free() uses st_map->image as flag * for "charged or not". In this case, we need to unchange @@ -711,7 +710,6 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) } mutex_init(&st_map->lock); - set_vm_flush_reset_perms(st_map->image); bpf_map_init_from_attr(map, attr); return map; diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index 99c7221a8933..f4b0df32a071 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -254,7 +254,7 @@ bpf_trampoline_get_progs(const struct bpf_trampoline *tr, int *total, bool *ip_a static void bpf_tramp_image_free(struct bpf_tramp_image *im) { bpf_image_ksym_del(&im->ksym); - bpf_jit_free_exec(im->image); + arch_free_bpf_trampoline(im->image, PAGE_SIZE); bpf_jit_uncharge_modmem(PAGE_SIZE); percpu_ref_exit(&im->pcref); kfree_rcu(im, rcu); @@ -365,10 +365,9 @@ static struct bpf_tramp_image *bpf_tramp_image_alloc(u64 key) goto out_free_im; err = -ENOMEM; - im->image = image = bpf_jit_alloc_exec(PAGE_SIZE); + im->image = image = arch_alloc_bpf_trampoline(PAGE_SIZE); if (!image) goto out_uncharge; - set_vm_flush_reset_perms(image); err = percpu_ref_init(&im->pcref, __bpf_tramp_image_release, 0, GFP_KERNEL); if (err) @@ -381,7 +380,7 @@ static struct bpf_tramp_image *bpf_tramp_image_alloc(u64 key) return im; out_free_image: - bpf_jit_free_exec(im->image); + arch_free_bpf_trampoline(im->image, PAGE_SIZE); out_uncharge: bpf_jit_uncharge_modmem(PAGE_SIZE); out_free_im: @@ -444,7 +443,7 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut if (err < 0) goto out_free; - set_memory_rox((long)im->image, 1); + arch_protect_bpf_trampoline(im->image, PAGE_SIZE); WARN_ON(tr->cur_image && total == 0); if (tr->cur_image) @@ -461,8 +460,7 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut * trampoline again, and retry register. */ /* reset im->image memory attr for arch_prepare_bpf_trampoline */ - set_memory_nx((long)im->image, 1); - set_memory_rw((long)im->image, 1); + arch_unprotect_bpf_trampoline(im->image, PAGE_SIZE); goto again; } #endif @@ -1068,6 +1066,40 @@ arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image return -ENOTSUPP; } +void * __weak arch_alloc_bpf_trampoline(unsigned int size) +{ + void *image; + + if (WARN_ON_ONCE(size > PAGE_SIZE)) + return NULL; + image = bpf_jit_alloc_exec(PAGE_SIZE); + if (image) + set_vm_flush_reset_perms(image); + return image; +} + +void __weak arch_free_bpf_trampoline(void *image, unsigned int size) +{ + WARN_ON_ONCE(size > PAGE_SIZE); + /* bpf_jit_free_exec doesn't need "size", but + * bpf_prog_pack_free() needs it. + */ + bpf_jit_free_exec(image); +} + +void __weak arch_protect_bpf_trampoline(void *image, unsigned int size) +{ + WARN_ON_ONCE(size > PAGE_SIZE); + set_memory_rox((long)image, 1); +} + +void __weak arch_unprotect_bpf_trampoline(void *image, unsigned int size) +{ + WARN_ON_ONCE(size > PAGE_SIZE); + set_memory_nx((long)image, 1); + set_memory_rw((long)image, 1); +} + static int __init init_trampolines(void) { int i; diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c index 5918d1b32e19..2748f9d77b18 100644 --- a/net/bpf/bpf_dummy_struct_ops.c +++ b/net/bpf/bpf_dummy_struct_ops.c @@ -101,12 +101,11 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, goto out; } - image = bpf_jit_alloc_exec(PAGE_SIZE); + image = arch_alloc_bpf_trampoline(PAGE_SIZE); if (!image) { err = -ENOMEM; goto out; } - set_vm_flush_reset_perms(image); link = kzalloc(sizeof(*link), GFP_USER); if (!link) { @@ -124,7 +123,7 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, if (err < 0) goto out; - set_memory_rox((long)image, 1); + arch_protect_bpf_trampoline(image, PAGE_SIZE); prog_ret = dummy_ops_call_op(image, args); err = dummy_ops_copy_args(args); @@ -134,7 +133,7 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, err = -EFAULT; out: kfree(args); - bpf_jit_free_exec(image); + arch_free_bpf_trampoline(image, PAGE_SIZE); if (link) bpf_link_put(&link->link); kfree(tlinks); -- 2.34.1
From: Song Liu <song@kernel.org> mainline inclusion from mainline-v6.8-rc1 commit 38b8b58ae776bf748bd1bd7a24c3fd1d10f76f45 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- x86's implementation of arch_prepare_bpf_trampoline() requires BPF_INSN_SAFETY buffer space between end of program and image_end. OTOH, the return value does not include BPF_INSN_SAFETY. This doesn't cause any real issue at the moment. However, "image" of size retval is not enough for arch_prepare_bpf_trampoline(). This will cause confusion when we introduce a new helper arch_bpf_trampoline_size(). To avoid future confusion, adjust the return value to include BPF_INSN_SAFETY. Signed-off-by: Song Liu <song@kernel.org> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20231206224054.492250-5-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- arch/x86/net/bpf_jit_comp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index fca7da66067e..6216072a9b64 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -2697,7 +2697,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i ret = -EFAULT; goto cleanup; } - ret = prog - (u8 *)image; + ret = prog - (u8 *)image + BPF_INSN_SAFETY; cleanup: kfree(branches); -- 2.34.1
From: Song Liu <song@kernel.org> mainline inclusion from mainline-v6.8-rc1 commit 96d1b7c081c0c96cbe8901045f4ff15a2e9974a2 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- This helper will be used to calculate the size of the trampoline before allocating the memory. arch_prepare_bpf_trampoline() for arm64 and riscv64 can use arch_bpf_trampoline_size() to check the trampoline fits in the image. OTOH, arch_prepare_bpf_trampoline() for s390 has to call the JIT process twice, so it cannot use arch_bpf_trampoline_size(). Signed-off-by: Song Liu <song@kernel.org> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # on s390x Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> # on riscv Link: https://lore.kernel.org/r/20231206224054.492250-6-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- arch/arm64/net/bpf_jit_comp.c | 56 ++++++++++++++++++++++++--------- arch/riscv/net/bpf_jit_comp64.c | 22 ++++++++++--- arch/s390/net/bpf_jit_comp.c | 56 ++++++++++++++++++++------------- arch/x86/net/bpf_jit_comp.c | 40 ++++++++++++++++++++--- include/linux/bpf.h | 2 ++ kernel/bpf/trampoline.c | 6 ++++ 6 files changed, 136 insertions(+), 46 deletions(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 50d97e94fe63..098c057a0dd0 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -2105,18 +2105,10 @@ static int prepare_trampoline(struct jit_ctx *ctx, struct bpf_tramp_image *im, return ctx->idx; } -int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, - void *image_end, const struct btf_func_model *m, - u32 flags, struct bpf_tramp_links *tlinks, - void *func_addr) +static int btf_func_model_nregs(const struct btf_func_model *m) { - int i, ret; int nregs = m->nr_args; - int max_insns = ((long)image_end - (long)image) / AARCH64_INSN_SIZE; - struct jit_ctx ctx = { - .image = NULL, - .idx = 0, - }; + int i; /* extra registers needed for struct argument */ for (i = 0; i < MAX_BPF_FUNC_ARGS; i++) { @@ -2125,19 +2117,53 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, nregs += (m->arg_size[i] + 7) / 8 - 1; } + return nregs; +} + +int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags, + struct bpf_tramp_links *tlinks, void *func_addr) +{ + struct jit_ctx ctx = { + .image = NULL, + .idx = 0, + }; + struct bpf_tramp_image im; + int nregs, ret; + + nregs = btf_func_model_nregs(m); /* the first 8 registers are used for arguments */ if (nregs > 8) return -ENOTSUPP; - ret = prepare_trampoline(&ctx, im, tlinks, func_addr, nregs, flags); + ret = prepare_trampoline(&ctx, &im, tlinks, func_addr, nregs, flags); if (ret < 0) return ret; - if (ret > max_insns) - return -EFBIG; + return ret < 0 ? ret : ret * AARCH64_INSN_SIZE; +} - ctx.image = image; - ctx.idx = 0; +int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, + void *image_end, const struct btf_func_model *m, + u32 flags, struct bpf_tramp_links *tlinks, + void *func_addr) +{ + int ret, nregs; + struct jit_ctx ctx = { + .image = image, + .idx = 0, + }; + + nregs = btf_func_model_nregs(m); + /* the first 8 registers are used for arguments */ + if (nregs > 8) + return -ENOTSUPP; + + ret = arch_bpf_trampoline_size(m, flags, tlinks, func_addr); + if (ret < 0) + return ret; + + if (ret > ((long)image_end - (long)image)) + return -EFBIG; jit_fill_hole(image, (unsigned int)(image_end - image)); ret = prepare_trampoline(&ctx, im, tlinks, func_addr, nregs, flags); diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c index 5426dc2697f9..4f046eb2b5cf 100644 --- a/arch/riscv/net/bpf_jit_comp64.c +++ b/arch/riscv/net/bpf_jit_comp64.c @@ -1029,6 +1029,21 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, return ret; } +int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags, + struct bpf_tramp_links *tlinks, void *func_addr) +{ + struct bpf_tramp_image im; + struct rv_jit_context ctx; + int ret; + + ctx.ninsns = 0; + ctx.insns = NULL; + ctx.ro_insns = NULL; + ret = __arch_prepare_bpf_trampoline(&im, m, tlinks, func_addr, flags, &ctx); + + return ret < 0 ? ret : ninsns_rvoff(ctx.ninsns); +} + int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end, const struct btf_func_model *m, u32 flags, struct bpf_tramp_links *tlinks, @@ -1037,14 +1052,11 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, int ret; struct rv_jit_context ctx; - ctx.ninsns = 0; - ctx.insns = NULL; - ctx.ro_insns = NULL; - ret = __arch_prepare_bpf_trampoline(im, m, tlinks, func_addr, flags, &ctx); + ret = arch_bpf_trampoline_size(im, m, flags, tlinks, func_addr); if (ret < 0) return ret; - if (ninsns_rvoff(ret) > (long)image_end - (long)image) + if (ret > (long)image_end - (long)image) return -EFBIG; ctx.ninsns = 0; diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index 56143957def4..1f22da0ec420 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -2539,6 +2539,21 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, return 0; } +int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags, + struct bpf_tramp_links *tlinks, void *orig_call) +{ + struct bpf_tramp_image im; + struct bpf_tramp_jit tjit; + int ret; + + memset(&tjit, 0, sizeof(tjit)); + + ret = __arch_prepare_bpf_trampoline(&im, &tjit, m, flags, + tlinks, orig_call); + + return ret < 0 ? ret : tjit.common.prg; +} + int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end, const struct btf_func_model *m, u32 flags, struct bpf_tramp_links *tlinks, @@ -2546,30 +2561,27 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, { struct bpf_tramp_jit tjit; int ret; - int i; - for (i = 0; i < 2; i++) { - if (i == 0) { - /* Compute offsets, check whether the code fits. */ - memset(&tjit, 0, sizeof(tjit)); - } else { - /* Generate the code. */ - tjit.common.prg = 0; - tjit.common.prg_buf = image; - } - ret = __arch_prepare_bpf_trampoline(im, &tjit, m, flags, - tlinks, func_addr); - if (ret < 0) - return ret; - if (tjit.common.prg > (char *)image_end - (char *)image) - /* - * Use the same error code as for exceeding - * BPF_MAX_TRAMP_LINKS. - */ - return -E2BIG; - } + /* Compute offsets, check whether the code fits. */ + memset(&tjit, 0, sizeof(tjit)); + ret = __arch_prepare_bpf_trampoline(im, &tjit, m, flags, + tlinks, func_addr); + + if (ret < 0) + return ret; + if (tjit.common.prg > (char *)image_end - (char *)image) + /* + * Use the same error code as for exceeding + * BPF_MAX_TRAMP_LINKS. + */ + return -E2BIG; + + tjit.common.prg = 0; + tjit.common.prg_buf = image; + ret = __arch_prepare_bpf_trampoline(im, &tjit, m, flags, + tlinks, func_addr); - return tjit.common.prg; + return ret < 0 ? ret : tjit.common.prg; } bool bpf_jit_supports_subprog_tailcalls(void) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 6216072a9b64..887606a47a88 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -2448,10 +2448,10 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog, * add rsp, 8 // skip eth_type_trans's frame * ret // return to its caller */ -int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end, - const struct btf_func_model *m, u32 flags, - struct bpf_tramp_links *tlinks, - void *func_addr) +static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end, + const struct btf_func_model *m, u32 flags, + struct bpf_tramp_links *tlinks, + void *func_addr) { int i, ret, nr_regs = m->nr_args, stack_size = 0; int regs_off, nregs_off, ip_off, run_ctx_off, arg_stack_off, rbx_off; @@ -2704,6 +2704,38 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i return ret; } +int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end, + const struct btf_func_model *m, u32 flags, + struct bpf_tramp_links *tlinks, + void *func_addr) +{ + return __arch_prepare_bpf_trampoline(im, image, image_end, m, flags, tlinks, func_addr); +} + +int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags, + struct bpf_tramp_links *tlinks, void *func_addr) +{ + struct bpf_tramp_image im; + void *image; + int ret; + + /* Allocate a temporary buffer for __arch_prepare_bpf_trampoline(). + * This will NOT cause fragmentation in direct map, as we do not + * call set_memory_*() on this buffer. + * + * We cannot use kvmalloc here, because we need image to be in + * module memory range. + */ + image = bpf_jit_alloc_exec(PAGE_SIZE); + if (!image) + return -ENOMEM; + + ret = __arch_prepare_bpf_trampoline(&im, image, image + PAGE_SIZE, m, flags, + tlinks, func_addr); + bpf_jit_free_exec(image); + return ret; +} + static int emit_bpf_dispatcher(u8 **pprog, int a, int b, s64 *progs, u8 *image, u8 *buf) { u8 *jg_reloc, *prog = *pprog; diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 689dd882c791..32a3aa04f4b5 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1159,6 +1159,8 @@ void *arch_alloc_bpf_trampoline(unsigned int size); void arch_free_bpf_trampoline(void *image, unsigned int size); void arch_protect_bpf_trampoline(void *image, unsigned int size); void arch_unprotect_bpf_trampoline(void *image, unsigned int size); +int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags, + struct bpf_tramp_links *tlinks, void *func_addr); u64 notrace __bpf_prog_enter_sleepable_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx); diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index f4b0df32a071..1540072cd7fa 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -1100,6 +1100,12 @@ void __weak arch_unprotect_bpf_trampoline(void *image, unsigned int size) set_memory_rw((long)image, 1); } +int __weak arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags, + struct bpf_tramp_links *tlinks, void *func_addr) +{ + return -ENOTSUPP; +} + static int __init init_trampolines(void) { int i; -- 2.34.1
From: Song Liu <song@kernel.org> mainline inclusion from mainline-v6.8-rc1 commit 26ef208c209a0e6eed8942a5d191b39dccfa6e38 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Instead of blindly allocating PAGE_SIZE for each trampoline, check the size of the trampoline with arch_bpf_trampoline_size(). This size is saved in bpf_tramp_image->size, and used for modmem charge/uncharge. The fallback arch_alloc_bpf_trampoline() still allocates a whole page because we need to use set_memory_* to protect the memory. struct_ops trampoline still uses a whole page for multiple trampolines. With this size check at caller (regular trampoline and struct_ops trampoline), remove arch_bpf_trampoline_size() from arch_prepare_bpf_trampoline() in archs. Also, update bpf_image_ksym_add() to handle symbol of different sizes. Signed-off-by: Song Liu <song@kernel.org> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Tested-by: Ilya Leoshkevich <iii@linux.ibm.com> # on s390x Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Björn Töpel <bjorn@rivosinc.com> Tested-by: Björn Töpel <bjorn@rivosinc.com> # on riscv Link: https://lore.kernel.org/r/20231206224054.492250-7-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Conflicts: kernel/bpf/trampoline.c [Fix conflict because lts patch c6ef788c2ca6 is merged.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- arch/arm64/net/bpf_jit_comp.c | 7 ----- arch/riscv/net/bpf_jit_comp64.c | 7 ----- include/linux/bpf.h | 3 +- kernel/bpf/bpf_struct_ops.c | 7 +++++ kernel/bpf/dispatcher.c | 2 +- kernel/bpf/trampoline.c | 54 ++++++++++++++++++++------------- 6 files changed, 43 insertions(+), 37 deletions(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 098c057a0dd0..057828e9b3f8 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -2158,13 +2158,6 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, if (nregs > 8) return -ENOTSUPP; - ret = arch_bpf_trampoline_size(m, flags, tlinks, func_addr); - if (ret < 0) - return ret; - - if (ret > ((long)image_end - (long)image)) - return -EFBIG; - jit_fill_hole(image, (unsigned int)(image_end - image)); ret = prepare_trampoline(&ctx, im, tlinks, func_addr, nregs, flags); diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c index 4f046eb2b5cf..4a1a61ae14d6 100644 --- a/arch/riscv/net/bpf_jit_comp64.c +++ b/arch/riscv/net/bpf_jit_comp64.c @@ -1052,13 +1052,6 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, int ret; struct rv_jit_context ctx; - ret = arch_bpf_trampoline_size(im, m, flags, tlinks, func_addr); - if (ret < 0) - return ret; - - if (ret > (long)image_end - (long)image) - return -EFBIG; - ctx.ninsns = 0; /* * The bpf_int_jit_compile() uses a RW buffer (ctx.insns) to write the diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 32a3aa04f4b5..461e9451c3e4 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1194,6 +1194,7 @@ enum bpf_tramp_prog_type { struct bpf_tramp_image { void *image; + int size; struct bpf_ksym ksym; struct percpu_ref pcref; void *ip_after_call; @@ -1395,7 +1396,7 @@ int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_func void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from, struct bpf_prog *to); /* Called only from JIT-enabled code, so there's no need for stubs. */ -void bpf_image_ksym_add(void *data, struct bpf_ksym *ksym); +void bpf_image_ksym_add(void *data, unsigned int size, struct bpf_ksym *ksym); void bpf_image_ksym_del(struct bpf_ksym *ksym); void bpf_ksym_add(struct bpf_ksym *ksym); void bpf_ksym_del(struct bpf_ksym *ksym); diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index e9e95879bce2..4d53c53fc5aa 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -355,6 +355,7 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks, void *image, void *image_end) { u32 flags; + int size; tlinks[BPF_TRAMP_FENTRY].links[0] = link; tlinks[BPF_TRAMP_FENTRY].nr_links = 1; @@ -362,6 +363,12 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks, * and it must be used alone. */ flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0; + + size = arch_bpf_trampoline_size(model, flags, tlinks, NULL); + if (size < 0) + return size; + if (size > (unsigned long)image_end - (unsigned long)image) + return -E2BIG; return arch_prepare_bpf_trampoline(NULL, image, image_end, model, flags, tlinks, NULL); } diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c index 56760fc10e78..70fb82bf1637 100644 --- a/kernel/bpf/dispatcher.c +++ b/kernel/bpf/dispatcher.c @@ -154,7 +154,7 @@ void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from, d->image = NULL; goto out; } - bpf_image_ksym_add(d->image, &d->ksym); + bpf_image_ksym_add(d->image, PAGE_SIZE, &d->ksym); } prev_num_progs = d->num_progs; diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index 1540072cd7fa..ba4280057f49 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -115,10 +115,10 @@ bool bpf_prog_has_trampoline(const struct bpf_prog *prog) (ptype == BPF_PROG_TYPE_LSM && eatype == BPF_LSM_MAC); } -void bpf_image_ksym_add(void *data, struct bpf_ksym *ksym) +void bpf_image_ksym_add(void *data, unsigned int size, struct bpf_ksym *ksym) { ksym->start = (unsigned long) data; - ksym->end = ksym->start + PAGE_SIZE; + ksym->end = ksym->start + size; bpf_ksym_add(ksym); perf_event_ksymbol(PERF_RECORD_KSYMBOL_TYPE_BPF, ksym->start, PAGE_SIZE, false, ksym->name); @@ -254,8 +254,8 @@ bpf_trampoline_get_progs(const struct bpf_trampoline *tr, int *total, bool *ip_a static void bpf_tramp_image_free(struct bpf_tramp_image *im) { bpf_image_ksym_del(&im->ksym); - arch_free_bpf_trampoline(im->image, PAGE_SIZE); - bpf_jit_uncharge_modmem(PAGE_SIZE); + arch_free_bpf_trampoline(im->image, im->size); + bpf_jit_uncharge_modmem(im->size); percpu_ref_exit(&im->pcref); kfree_rcu(im, rcu); } @@ -349,7 +349,7 @@ static void bpf_tramp_image_put(struct bpf_tramp_image *im) call_rcu_tasks_trace(&im->rcu, __bpf_tramp_image_put_rcu_tasks); } -static struct bpf_tramp_image *bpf_tramp_image_alloc(u64 key) +static struct bpf_tramp_image *bpf_tramp_image_alloc(u64 key, int size) { struct bpf_tramp_image *im; struct bpf_ksym *ksym; @@ -360,12 +360,13 @@ static struct bpf_tramp_image *bpf_tramp_image_alloc(u64 key) if (!im) goto out; - err = bpf_jit_charge_modmem(PAGE_SIZE); + err = bpf_jit_charge_modmem(size); if (err) goto out_free_im; + im->size = size; err = -ENOMEM; - im->image = image = arch_alloc_bpf_trampoline(PAGE_SIZE); + im->image = image = arch_alloc_bpf_trampoline(size); if (!image) goto out_uncharge; @@ -376,13 +377,13 @@ static struct bpf_tramp_image *bpf_tramp_image_alloc(u64 key) ksym = &im->ksym; INIT_LIST_HEAD_RCU(&ksym->lnode); snprintf(ksym->name, KSYM_NAME_LEN, "bpf_trampoline_%llu", key); - bpf_image_ksym_add(image, ksym); + bpf_image_ksym_add(image, size, ksym); return im; out_free_image: - arch_free_bpf_trampoline(im->image, PAGE_SIZE); + arch_free_bpf_trampoline(im->image, im->size); out_uncharge: - bpf_jit_uncharge_modmem(PAGE_SIZE); + bpf_jit_uncharge_modmem(size); out_free_im: kfree(im); out: @@ -395,7 +396,7 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut struct bpf_tramp_links *tlinks; u32 orig_flags = tr->flags; bool ip_arg = false; - int err, total; + int err, total, size; tlinks = bpf_trampoline_get_progs(tr, &total, &ip_arg); if (IS_ERR(tlinks)) @@ -408,12 +409,6 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut goto out; } - im = bpf_tramp_image_alloc(tr->key); - if (IS_ERR(im)) { - err = PTR_ERR(im); - goto out; - } - /* clear all bits except SHARE_IPMODIFY and TAIL_CALL_CTX */ tr->flags &= (BPF_TRAMP_F_SHARE_IPMODIFY | BPF_TRAMP_F_TAIL_CALL_CTX); @@ -437,13 +432,31 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut tr->flags |= BPF_TRAMP_F_ORIG_STACK; #endif - err = arch_prepare_bpf_trampoline(im, im->image, im->image + PAGE_SIZE, + size = arch_bpf_trampoline_size(&tr->func.model, tr->flags, + tlinks, tr->func.addr); + if (size < 0) { + err = size; + goto out; + } + + if (size > PAGE_SIZE) { + err = -E2BIG; + goto out; + } + + im = bpf_tramp_image_alloc(tr->key, size); + if (IS_ERR(im)) { + err = PTR_ERR(im); + goto out; + } + + err = arch_prepare_bpf_trampoline(im, im->image, im->image + size, &tr->func.model, tr->flags, tlinks, tr->func.addr); if (err < 0) goto out_free; - arch_protect_bpf_trampoline(im->image, PAGE_SIZE); + arch_protect_bpf_trampoline(im->image, im->size); WARN_ON(tr->cur_image && total == 0); if (tr->cur_image) @@ -459,8 +472,7 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut * BPF_TRAMP_F_SHARE_IPMODIFY is set, we can generate the * trampoline again, and retry register. */ - /* reset im->image memory attr for arch_prepare_bpf_trampoline */ - arch_unprotect_bpf_trampoline(im->image, PAGE_SIZE); + bpf_tramp_image_free(im); goto again; } #endif -- 2.34.1
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA -------------------------------- Fix kabi-breakage by using KABI_USE. Fixes: 7c2f256d4f9a ("bpf: Use arch_bpf_trampoline_size") Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 461e9451c3e4..533148ccb549 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1194,7 +1194,6 @@ enum bpf_tramp_prog_type { struct bpf_tramp_image { void *image; - int size; struct bpf_ksym ksym; struct percpu_ref pcref; void *ip_after_call; @@ -1204,7 +1203,7 @@ struct bpf_tramp_image { struct work_struct work; }; - KABI_RESERVE(1) + KABI_USE(1, int size) KABI_RESERVE(2) KABI_RESERVE(3) KABI_RESERVE(4) -- 2.34.1
From: Song Liu <song@kernel.org> mainline inclusion from mainline-v6.8-rc1 commit 3ba026fca8786161b0c4d75be396e61d6816e0a1 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- There are three major changes here: 1. Add arch_[alloc|free]_bpf_trampoline based on bpf_prog_pack; 2. Let arch_prepare_bpf_trampoline handle ROX input image, this requires arch_prepare_bpf_trampoline allocating a temporary RW buffer; 3. Update __arch_prepare_bpf_trampoline() to handle a RW buffer (rw_image) and a ROX buffer (image). This part is similar to the image/rw_image logic in bpf_int_jit_compile(). Signed-off-by: Song Liu <song@kernel.org> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20231206224054.492250-8-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- arch/x86/net/bpf_jit_comp.c | 98 +++++++++++++++++++++++++++---------- 1 file changed, 72 insertions(+), 26 deletions(-) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 887606a47a88..a4f84b7e89b8 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -2224,7 +2224,8 @@ static void restore_regs(const struct btf_func_model *m, u8 **prog, static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog, struct bpf_tramp_link *l, int stack_size, - int run_ctx_off, bool save_ret) + int run_ctx_off, bool save_ret, + void *image, void *rw_image) { u8 *prog = *pprog; u8 *jmp_insn; @@ -2252,7 +2253,7 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog, else EMIT4(0x48, 0x8D, 0x75, -run_ctx_off); - if (emit_rsb_call(&prog, bpf_trampoline_enter(p), prog)) + if (emit_rsb_call(&prog, bpf_trampoline_enter(p), image + (prog - (u8 *)rw_image))) return -EINVAL; /* remember prog start time returned by __bpf_prog_enter */ emit_mov_reg(&prog, true, BPF_REG_6, BPF_REG_0); @@ -2276,7 +2277,7 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog, (long) p->insnsi >> 32, (u32) (long) p->insnsi); /* call JITed bpf program or interpreter */ - if (emit_rsb_call(&prog, p->bpf_func, prog)) + if (emit_rsb_call(&prog, p->bpf_func, image + (prog - (u8 *)rw_image))) return -EINVAL; /* @@ -2303,7 +2304,7 @@ static int invoke_bpf_prog(const struct btf_func_model *m, u8 **pprog, EMIT3_off32(0x48, 0x8D, 0x95, -run_ctx_off); else EMIT4(0x48, 0x8D, 0x55, -run_ctx_off); - if (emit_rsb_call(&prog, bpf_trampoline_exit(p), prog)) + if (emit_rsb_call(&prog, bpf_trampoline_exit(p), image + (prog - (u8 *)rw_image))) return -EINVAL; *pprog = prog; @@ -2338,14 +2339,15 @@ static int emit_cond_near_jump(u8 **pprog, void *func, void *ip, u8 jmp_cond) static int invoke_bpf(const struct btf_func_model *m, u8 **pprog, struct bpf_tramp_links *tl, int stack_size, - int run_ctx_off, bool save_ret) + int run_ctx_off, bool save_ret, + void *image, void *rw_image) { int i; u8 *prog = *pprog; for (i = 0; i < tl->nr_links; i++) { if (invoke_bpf_prog(m, &prog, tl->links[i], stack_size, - run_ctx_off, save_ret)) + run_ctx_off, save_ret, image, rw_image)) return -EINVAL; } *pprog = prog; @@ -2354,7 +2356,8 @@ static int invoke_bpf(const struct btf_func_model *m, u8 **pprog, static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog, struct bpf_tramp_links *tl, int stack_size, - int run_ctx_off, u8 **branches) + int run_ctx_off, u8 **branches, + void *image, void *rw_image) { u8 *prog = *pprog; int i; @@ -2365,7 +2368,8 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog, emit_mov_imm32(&prog, false, BPF_REG_0, 0); emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8); for (i = 0; i < tl->nr_links; i++) { - if (invoke_bpf_prog(m, &prog, tl->links[i], stack_size, run_ctx_off, true)) + if (invoke_bpf_prog(m, &prog, tl->links[i], stack_size, run_ctx_off, true, + image, rw_image)) return -EINVAL; /* mod_ret prog stored return value into [rbp - 8]. Emit: @@ -2448,7 +2452,8 @@ static int invoke_bpf_mod_ret(const struct btf_func_model *m, u8 **pprog, * add rsp, 8 // skip eth_type_trans's frame * ret // return to its caller */ -static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end, +static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_image, + void *rw_image_end, void *image, const struct btf_func_model *m, u32 flags, struct bpf_tramp_links *tlinks, void *func_addr) @@ -2547,7 +2552,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image orig_call += X86_PATCH_SIZE; } - prog = image; + prog = rw_image; EMIT_ENDBR(); /* @@ -2589,7 +2594,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image if (flags & BPF_TRAMP_F_CALL_ORIG) { /* arg1: mov rdi, im */ emit_mov_imm64(&prog, BPF_REG_1, (long) im >> 32, (u32) (long) im); - if (emit_rsb_call(&prog, __bpf_tramp_enter, prog)) { + if (emit_rsb_call(&prog, __bpf_tramp_enter, + image + (prog - (u8 *)rw_image))) { ret = -EINVAL; goto cleanup; } @@ -2597,7 +2603,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image if (fentry->nr_links) if (invoke_bpf(m, &prog, fentry, regs_off, run_ctx_off, - flags & BPF_TRAMP_F_RET_FENTRY_RET)) + flags & BPF_TRAMP_F_RET_FENTRY_RET, image, rw_image)) return -EINVAL; if (fmod_ret->nr_links) { @@ -2607,7 +2613,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image return -ENOMEM; if (invoke_bpf_mod_ret(m, &prog, fmod_ret, regs_off, - run_ctx_off, branches)) { + run_ctx_off, branches, image, rw_image)) { ret = -EINVAL; goto cleanup; } @@ -2628,14 +2634,14 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image EMIT2(0xff, 0xd3); /* call *rbx */ } else { /* call original function */ - if (emit_rsb_call(&prog, orig_call, prog)) { + if (emit_rsb_call(&prog, orig_call, image + (prog - (u8 *)rw_image))) { ret = -EINVAL; goto cleanup; } } /* remember return value in a stack for bpf prog to access */ emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -8); - im->ip_after_call = prog; + im->ip_after_call = image + (prog - (u8 *)rw_image); memcpy(prog, x86_nops[5], X86_PATCH_SIZE); prog += X86_PATCH_SIZE; } @@ -2651,12 +2657,13 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image * aligned address of do_fexit. */ for (i = 0; i < fmod_ret->nr_links; i++) - emit_cond_near_jump(&branches[i], prog, branches[i], - X86_JNE); + emit_cond_near_jump(&branches[i], image + (prog - (u8 *)rw_image), + image + (branches[i] - (u8 *)rw_image), X86_JNE); } if (fexit->nr_links) - if (invoke_bpf(m, &prog, fexit, regs_off, run_ctx_off, false)) { + if (invoke_bpf(m, &prog, fexit, regs_off, run_ctx_off, + false, image, rw_image)) { ret = -EINVAL; goto cleanup; } @@ -2669,10 +2676,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image * restored to R0. */ if (flags & BPF_TRAMP_F_CALL_ORIG) { - im->ip_epilogue = prog; + im->ip_epilogue = image + (prog - (u8 *)rw_image); /* arg1: mov rdi, im */ emit_mov_imm64(&prog, BPF_REG_1, (long) im >> 32, (u32) (long) im); - if (emit_rsb_call(&prog, __bpf_tramp_exit, prog)) { + if (emit_rsb_call(&prog, __bpf_tramp_exit, image + (prog - (u8 *)rw_image))) { ret = -EINVAL; goto cleanup; } @@ -2691,25 +2698,64 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image if (flags & BPF_TRAMP_F_SKIP_FRAME) /* skip our return address and return to parent */ EMIT4(0x48, 0x83, 0xC4, 8); /* add rsp, 8 */ - emit_return(&prog, prog); + emit_return(&prog, image + (prog - (u8 *)rw_image)); /* Make sure the trampoline generation logic doesn't overflow */ - if (WARN_ON_ONCE(prog > (u8 *)image_end - BPF_INSN_SAFETY)) { + if (WARN_ON_ONCE(prog > (u8 *)rw_image_end - BPF_INSN_SAFETY)) { ret = -EFAULT; goto cleanup; } - ret = prog - (u8 *)image + BPF_INSN_SAFETY; + ret = prog - (u8 *)rw_image + BPF_INSN_SAFETY; cleanup: kfree(branches); return ret; } +void *arch_alloc_bpf_trampoline(unsigned int size) +{ + return bpf_prog_pack_alloc(size, jit_fill_hole); +} + +void arch_free_bpf_trampoline(void *image, unsigned int size) +{ + bpf_prog_pack_free(image, size); +} + +void arch_protect_bpf_trampoline(void *image, unsigned int size) +{ +} + +void arch_unprotect_bpf_trampoline(void *image, unsigned int size) +{ +} + int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end, const struct btf_func_model *m, u32 flags, struct bpf_tramp_links *tlinks, void *func_addr) { - return __arch_prepare_bpf_trampoline(im, image, image_end, m, flags, tlinks, func_addr); + void *rw_image, *tmp; + int ret; + u32 size = image_end - image; + + /* rw_image doesn't need to be in module memory range, so we can + * use kvmalloc. + */ + rw_image = kvmalloc(size, GFP_KERNEL); + if (!rw_image) + return -ENOMEM; + + ret = __arch_prepare_bpf_trampoline(im, rw_image, rw_image + size, image, m, + flags, tlinks, func_addr); + if (ret < 0) + goto out; + + tmp = bpf_arch_text_copy(image, rw_image, size); + if (IS_ERR(tmp)) + ret = PTR_ERR(tmp); +out: + kvfree(rw_image); + return ret; } int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags, @@ -2730,8 +2776,8 @@ int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags, if (!image) return -ENOMEM; - ret = __arch_prepare_bpf_trampoline(&im, image, image + PAGE_SIZE, m, flags, - tlinks, func_addr); + ret = __arch_prepare_bpf_trampoline(&im, image, image + PAGE_SIZE, image, + m, flags, tlinks, func_addr); bpf_jit_free_exec(image); return ret; } -- 2.34.1
From: Peter Zijlstra <peterz@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit 4382159696c9af67ee047ed55f2dbf05480f52f6 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Normal include order is that linux/foo.h should include asm/foo.h, CFI has it the wrong way around. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sami Tolvanen <samitolvanen@google.com> Link: https://lore.kernel.org/r/20231215092707.231038174@infradead.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- arch/riscv/include/asm/cfi.h | 3 ++- arch/riscv/kernel/cfi.c | 2 +- arch/x86/include/asm/cfi.h | 3 ++- arch/x86/kernel/cfi.c | 4 ++-- include/asm-generic/Kbuild | 1 + include/asm-generic/cfi.h | 5 +++++ include/linux/cfi.h | 1 + 7 files changed, 14 insertions(+), 5 deletions(-) create mode 100644 include/asm-generic/cfi.h diff --git a/arch/riscv/include/asm/cfi.h b/arch/riscv/include/asm/cfi.h index 56bf9d69d5e3..8f7a62257044 100644 --- a/arch/riscv/include/asm/cfi.h +++ b/arch/riscv/include/asm/cfi.h @@ -7,8 +7,9 @@ * * Copyright (C) 2023 Google LLC */ +#include <linux/bug.h> -#include <linux/cfi.h> +struct pt_regs; #ifdef CONFIG_CFI_CLANG enum bug_trap_type handle_cfi_failure(struct pt_regs *regs); diff --git a/arch/riscv/kernel/cfi.c b/arch/riscv/kernel/cfi.c index 820158d7a291..6ec9dbd7292e 100644 --- a/arch/riscv/kernel/cfi.c +++ b/arch/riscv/kernel/cfi.c @@ -4,7 +4,7 @@ * * Copyright (C) 2023 Google LLC */ -#include <asm/cfi.h> +#include <linux/cfi.h> #include <asm/insn.h> /* diff --git a/arch/x86/include/asm/cfi.h b/arch/x86/include/asm/cfi.h index 58dacd90daef..2a494643089d 100644 --- a/arch/x86/include/asm/cfi.h +++ b/arch/x86/include/asm/cfi.h @@ -7,8 +7,9 @@ * * Copyright (C) 2022 Google LLC */ +#include <linux/bug.h> -#include <linux/cfi.h> +struct pt_regs; #ifdef CONFIG_CFI_CLANG enum bug_trap_type handle_cfi_failure(struct pt_regs *regs); diff --git a/arch/x86/kernel/cfi.c b/arch/x86/kernel/cfi.c index 8674a5c0c031..e6bf78fac146 100644 --- a/arch/x86/kernel/cfi.c +++ b/arch/x86/kernel/cfi.c @@ -4,10 +4,10 @@ * * Copyright (C) 2022 Google LLC */ -#include <asm/cfi.h> +#include <linux/string.h> +#include <linux/cfi.h> #include <asm/insn.h> #include <asm/insn-eval.h> -#include <linux/string.h> /* * Returns the target address and the expected type when regs->ip points diff --git a/include/asm-generic/Kbuild b/include/asm-generic/Kbuild index 941be574bbe0..ca2be8eaba5e 100644 --- a/include/asm-generic/Kbuild +++ b/include/asm-generic/Kbuild @@ -11,6 +11,7 @@ mandatory-y += bitops.h mandatory-y += bug.h mandatory-y += bugs.h mandatory-y += cacheflush.h +mandatory-y += cfi.h mandatory-y += checksum.h mandatory-y += compat.h mandatory-y += current.h diff --git a/include/asm-generic/cfi.h b/include/asm-generic/cfi.h new file mode 100644 index 000000000000..41fac3537bf9 --- /dev/null +++ b/include/asm-generic/cfi.h @@ -0,0 +1,5 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_GENERIC_CFI_H +#define __ASM_GENERIC_CFI_H + +#endif /* __ASM_GENERIC_CFI_H */ diff --git a/include/linux/cfi.h b/include/linux/cfi.h index 3552ec82b725..2309d74e77e6 100644 --- a/include/linux/cfi.h +++ b/include/linux/cfi.h @@ -9,6 +9,7 @@ #include <linux/bug.h> #include <linux/module.h> +#include <asm/cfi.h> #ifdef CONFIG_CFI_CLANG enum bug_trap_type report_cfi_failure(struct pt_regs *regs, unsigned long addr, -- 2.34.1
From: Peter Zijlstra <peterz@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit 4f9087f16651aca4a5f32da840a53f6660f0579a category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- The current BPF call convention is __nocfi, except when it calls !JIT things, then it calls regular C functions. It so happens that with FineIBT the __nocfi and C calling conventions are incompatible. Specifically __nocfi will call at func+0, while FineIBT will have endbr-poison there, which is not a valid indirect target. Causing #CP. Notably this only triggers on IBT enabled hardware, which is probably why this hasn't been reported (also, most people will have JIT on anyway). Implement proper CFI prologues for the BPF JIT codegen and drop __nocfi for x86. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20231215092707.345270396@infradead.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Conflicts: arch/x86/net/bpf_jit_comp.c include/linux/bpf.h kernel/bpf/core.c [Fix conflict because of kabi and cev patches db775a8ebf140, and pre-patches fd5d27b70188 is not existed.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- arch/x86/include/asm/cfi.h | 110 ++++++++++++++++++++++++++++++++++ arch/x86/kernel/alternative.c | 47 ++++++++++++--- arch/x86/net/bpf_jit_comp.c | 82 +++++++++++++++++++++++-- include/linux/bpf.h | 13 +++- include/linux/cfi.h | 7 +++ kernel/bpf/core.c | 25 ++++++++ 6 files changed, 270 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/cfi.h b/arch/x86/include/asm/cfi.h index 2a494643089d..7a7b0b823a98 100644 --- a/arch/x86/include/asm/cfi.h +++ b/arch/x86/include/asm/cfi.h @@ -9,15 +9,125 @@ */ #include <linux/bug.h> +/* + * An overview of the various calling conventions... + * + * Traditional: + * + * foo: + * ... code here ... + * ret + * + * direct caller: + * call foo + * + * indirect caller: + * lea foo(%rip), %r11 + * ... + * call *%r11 + * + * + * IBT: + * + * foo: + * endbr64 + * ... code here ... + * ret + * + * direct caller: + * call foo / call foo+4 + * + * indirect caller: + * lea foo(%rip), %r11 + * ... + * call *%r11 + * + * + * kCFI: + * + * __cfi_foo: + * movl $0x12345678, %eax + * # 11 nops when CONFIG_CALL_PADDING + * foo: + * endbr64 # when IBT + * ... code here ... + * ret + * + * direct call: + * call foo # / call foo+4 when IBT + * + * indirect call: + * lea foo(%rip), %r11 + * ... + * movl $(-0x12345678), %r10d + * addl -4(%r11), %r10d # -15 when CONFIG_CALL_PADDING + * jz 1f + * ud2 + * 1:call *%r11 + * + * + * FineIBT (builds as kCFI + CALL_PADDING + IBT + RETPOLINE and runtime patches into): + * + * __cfi_foo: + * endbr64 + * subl 0x12345678, %r10d + * jz foo + * ud2 + * nop + * foo: + * osp nop3 # was endbr64 + * ... code here ... + * ret + * + * direct caller: + * call foo / call foo+4 + * + * indirect caller: + * lea foo(%rip), %r11 + * ... + * movl $0x12345678, %r10d + * subl $16, %r11 + * nop4 + * call *%r11 + * + */ +enum cfi_mode { + CFI_DEFAULT, /* FineIBT if hardware has IBT, otherwise kCFI */ + CFI_OFF, /* Taditional / IBT depending on .config */ + CFI_KCFI, /* Optionally CALL_PADDING, IBT, RETPOLINE */ + CFI_FINEIBT, /* see arch/x86/kernel/alternative.c */ +}; + +extern enum cfi_mode cfi_mode; + struct pt_regs; #ifdef CONFIG_CFI_CLANG enum bug_trap_type handle_cfi_failure(struct pt_regs *regs); +#define __bpfcall +extern u32 cfi_bpf_hash; + +static inline int cfi_get_offset(void) +{ + switch (cfi_mode) { + case CFI_FINEIBT: + return 16; + case CFI_KCFI: + if (IS_ENABLED(CONFIG_CALL_PADDING)) + return 16; + return 5; + default: + return 0; + } +} +#define cfi_get_offset cfi_get_offset + #else static inline enum bug_trap_type handle_cfi_failure(struct pt_regs *regs) { return BUG_TRAP_TYPE_NONE; } +#define cfi_bpf_hash 0U #endif /* CONFIG_CFI_CLANG */ #endif /* _ASM_X86_CFI_H */ diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index aae7456ece07..811f4a1f0fdf 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -30,6 +30,7 @@ #include <asm/fixmap.h> #include <asm/paravirt.h> #include <asm/asm-prototypes.h> +#include <asm/cfi.h> int __read_mostly alternatives_patched; @@ -842,15 +843,43 @@ void __init_or_module apply_seal_endbr(s32 *start, s32 *end) { } #endif /* CONFIG_X86_KERNEL_IBT */ #ifdef CONFIG_FINEIBT +#define __CFI_DEFAULT CFI_DEFAULT +#elif defined(CONFIG_CFI_CLANG) +#define __CFI_DEFAULT CFI_KCFI +#else +#define __CFI_DEFAULT CFI_OFF +#endif -enum cfi_mode { - CFI_DEFAULT, - CFI_OFF, - CFI_KCFI, - CFI_FINEIBT, -}; +enum cfi_mode cfi_mode __ro_after_init = __CFI_DEFAULT; + +#ifdef CONFIG_CFI_CLANG +struct bpf_insn; + +/* Must match bpf_func_t / DEFINE_BPF_PROG_RUN() */ +extern unsigned int __bpf_prog_runX(const void *ctx, + const struct bpf_insn *insn); + +/* + * Force a reference to the external symbol so the compiler generates + * __kcfi_typid. + */ +__ADDRESSABLE(__bpf_prog_runX); + +/* u32 __ro_after_init cfi_bpf_hash = __kcfi_typeid___bpf_prog_runX; */ +asm ( +" .pushsection .data..ro_after_init,\"aw\",@progbits \n" +" .type cfi_bpf_hash,@object \n" +" .globl cfi_bpf_hash \n" +" .p2align 2, 0x0 \n" +"cfi_bpf_hash: \n" +" .long __kcfi_typeid___bpf_prog_runX \n" +" .size cfi_bpf_hash, 4 \n" +" .popsection \n" +); +#endif + +#ifdef CONFIG_FINEIBT -static enum cfi_mode cfi_mode __ro_after_init = CFI_DEFAULT; static bool cfi_rand __ro_after_init = true; static u32 cfi_seed __ro_after_init; @@ -1159,8 +1188,10 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline, goto err; if (cfi_rand) { - if (builtin) + if (builtin) { cfi_seed = get_random_u32(); + cfi_bpf_hash = cfi_rehash(cfi_bpf_hash); + } ret = cfi_rand_preamble(start_cfi, end_cfi); if (ret) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index a4f84b7e89b8..6798237a5c83 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -16,6 +16,7 @@ #include <asm/set_memory.h> #include <asm/nospec-branch.h> #include <asm/text-patching.h> +#include <asm/cfi.h> static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len) { @@ -50,9 +51,11 @@ static u8 *emit_code(u8 *ptr, u32 bytes, unsigned int len) do { EMIT4(b1, b2, b3, b4); EMIT(off, 4); } while (0) #ifdef CONFIG_X86_KERNEL_IBT -#define EMIT_ENDBR() EMIT(gen_endbr(), 4) +#define EMIT_ENDBR() EMIT(gen_endbr(), 4) +#define EMIT_ENDBR_POISON() EMIT(gen_endbr_poison(), 4) #else #define EMIT_ENDBR() +#define EMIT_ENDBR_POISON() #endif static bool is_imm8(int value) @@ -337,6 +340,69 @@ static void pop_callee_regs(u8 **pprog, bool *callee_regs_used) *pprog = prog; } +/* + * Emit the various CFI preambles, see asm/cfi.h and the comments about FineIBT + * in arch/x86/kernel/alternative.c + */ + +static void emit_fineibt(u8 **pprog) +{ + u8 *prog = *pprog; + + EMIT_ENDBR(); + EMIT3_off32(0x41, 0x81, 0xea, cfi_bpf_hash); /* subl $hash, %r10d */ + EMIT2(0x74, 0x07); /* jz.d8 +7 */ + EMIT2(0x0f, 0x0b); /* ud2 */ + EMIT1(0x90); /* nop */ + EMIT_ENDBR_POISON(); + + *pprog = prog; +} + +static void emit_kcfi(u8 **pprog) +{ + u8 *prog = *pprog; + + EMIT1_off32(0xb8, cfi_bpf_hash); /* movl $hash, %eax */ +#ifdef CONFIG_CALL_PADDING + EMIT1(0x90); + EMIT1(0x90); + EMIT1(0x90); + EMIT1(0x90); + EMIT1(0x90); + EMIT1(0x90); + EMIT1(0x90); + EMIT1(0x90); + EMIT1(0x90); + EMIT1(0x90); + EMIT1(0x90); +#endif + EMIT_ENDBR(); + + *pprog = prog; +} + +static void emit_cfi(u8 **pprog) +{ + u8 *prog = *pprog; + + switch (cfi_mode) { + case CFI_FINEIBT: + emit_fineibt(&prog); + break; + + case CFI_KCFI: + emit_kcfi(&prog); + break; + + default: + EMIT_ENDBR(); + break; + } + + *pprog = prog; +} + /* * Emit x86-64 prologue code for BPF program. * bpf_tail_call helper will skip the first X86_TAIL_CALL_OFFSET bytes @@ -347,10 +413,10 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf, { u8 *prog = *pprog; + emit_cfi(&prog); /* BPF trampoline can be made to work without these nops, * but let's waste 5 bytes for now and optimize later */ - EMIT_ENDBR(); memcpy(prog, x86_nops[5], X86_PATCH_SIZE); prog += X86_PATCH_SIZE; if (!ebpf_from_cbpf) { @@ -3039,9 +3105,16 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) jit_data->header = header; jit_data->rw_header = rw_header; } - prog->bpf_func = (void *)image; + /* + * ctx.prog_offset is used when CFI preambles put code *before* + * the function. See emit_cfi(). For FineIBT specifically this code + * can also be executed and bpf_prog_kallsyms_add() will + * generate an additional symbol to cover this, hence also + * decrement proglen. + */ + prog->bpf_func = (void *)image + cfi_get_offset(); prog->jited = 1; - prog->jited_len = proglen; + prog->jited_len = proglen - cfi_get_offset(); } else { prog = orig_prog; } @@ -3096,6 +3169,7 @@ void bpf_jit_free(struct bpf_prog *prog) kvfree(jit_data->addrs); kfree(jit_data); } + prog->bpf_func = (void *)prog->bpf_func - cfi_get_offset(); hdr = bpf_jit_binary_pack_hdr(prog); bpf_jit_binary_pack_free(hdr, NULL); WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(prog)); diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 533148ccb549..2839780ba1b5 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -30,6 +30,7 @@ #include <linux/static_call.h> #include <linux/memcontrol.h> #include <linux/kabi.h> +#include <linux/cfi.h> struct bpf_verifier_env; struct bpf_verifier_log; @@ -1278,7 +1279,12 @@ struct bpf_dispatcher { KABI_RESERVE(4) }; -static __always_inline __nocfi unsigned int bpf_dispatcher_nop_func( +#ifndef __bpfcall +#define __bpfcall __nocfi +#endif + +static __always_inline __bpfcall unsigned int bpf_dispatcher_nop_func( + const void *ctx, const struct bpf_insn *insnsi, bpf_func_t bpf_func) @@ -1372,7 +1378,7 @@ int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_func #define DEFINE_BPF_DISPATCHER(name) \ __BPF_DISPATCHER_SC(name); \ - noinline __nocfi unsigned int bpf_dispatcher_##name##_func( \ + noinline __bpfcall unsigned int bpf_dispatcher_##name##_func( \ const void *ctx, \ const struct bpf_insn *insnsi, \ bpf_func_t bpf_func) \ @@ -1523,6 +1529,9 @@ struct bpf_prog_aux { struct bpf_kfunc_desc_tab *kfunc_tab; struct bpf_kfunc_btf_tab *kfunc_btf_tab; u32 size_poke_tab; +#ifdef CONFIG_FINEIBT + struct bpf_ksym ksym_prefix; +#endif struct bpf_ksym ksym; const struct bpf_prog_ops *ops; struct bpf_map **used_maps; diff --git a/include/linux/cfi.h b/include/linux/cfi.h index 2309d74e77e6..1ed2d96c0cfc 100644 --- a/include/linux/cfi.h +++ b/include/linux/cfi.h @@ -11,6 +11,13 @@ #include <linux/module.h> #include <asm/cfi.h> +#ifndef cfi_get_offset +static inline int cfi_get_offset(void) +{ + return 0; +} +#endif + #ifdef CONFIG_CFI_CLANG enum bug_trap_type report_cfi_failure(struct pt_regs *regs, unsigned long addr, unsigned long *target, u32 type); diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index d0436f78c4b7..50f305a31415 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -121,6 +121,9 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag #endif INIT_LIST_HEAD_RCU(&fp->aux->ksym.lnode); +#ifdef CONFIG_FINEIBT + INIT_LIST_HEAD_RCU(&fp->aux->ksym_prefix.lnode); +#endif mutex_init(&fp->aux->used_maps_mutex); mutex_init(&fp->aux->ext_mutex); mutex_init(&fp->aux->dst_mutex); @@ -692,6 +695,23 @@ void bpf_prog_kallsyms_add(struct bpf_prog *fp) fp->aux->ksym.prog = true; bpf_ksym_add(&fp->aux->ksym); + +#ifdef CONFIG_FINEIBT + /* + * When FineIBT, code in the __cfi_foo() symbols can get executed + * and hence unwinder needs help. + */ + if (cfi_mode != CFI_FINEIBT) + return; + + snprintf(fp->aux->ksym_prefix.name, KSYM_NAME_LEN, + "__cfi_%s", fp->aux->ksym.name); + + fp->aux->ksym_prefix.start = (unsigned long) fp->bpf_func - 16; + fp->aux->ksym_prefix.end = (unsigned long) fp->bpf_func; + + bpf_ksym_add(&fp->aux->ksym_prefix); +#endif } void bpf_prog_kallsyms_del(struct bpf_prog *fp) @@ -700,6 +720,11 @@ void bpf_prog_kallsyms_del(struct bpf_prog *fp) return; bpf_ksym_del(&fp->aux->ksym); +#ifdef CONFIG_FINEIBT + if (cfi_mode != CFI_FINEIBT) + return; + bpf_ksym_del(&fp->aux->ksym_prefix); +#endif } static struct bpf_ksym *bpf_ksym_find(unsigned long addr) -- 2.34.1
From: Peter Zijlstra <peterz@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit e72d88d18df4e03c80e64c2535f70c64f1dc6fc1 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Where the main BPF program is expected to match bpf_func_t, sub-programs are expected to match bpf_callback_t. This fixes things like: tools/testing/selftests/bpf/progs/bloom_filter_bench.c: bpf_for_each_map_elem(&array_map, bloom_callback, &data, 0); Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20231215092707.451956710@infradead.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- arch/x86/include/asm/cfi.h | 2 ++ arch/x86/kernel/alternative.c | 18 ++++++++++++++++++ arch/x86/net/bpf_jit_comp.c | 18 ++++++++++-------- 3 files changed, 30 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/cfi.h b/arch/x86/include/asm/cfi.h index 7a7b0b823a98..8779abd217b7 100644 --- a/arch/x86/include/asm/cfi.h +++ b/arch/x86/include/asm/cfi.h @@ -106,6 +106,7 @@ struct pt_regs; enum bug_trap_type handle_cfi_failure(struct pt_regs *regs); #define __bpfcall extern u32 cfi_bpf_hash; +extern u32 cfi_bpf_subprog_hash; static inline int cfi_get_offset(void) { @@ -128,6 +129,7 @@ static inline enum bug_trap_type handle_cfi_failure(struct pt_regs *regs) return BUG_TRAP_TYPE_NONE; } #define cfi_bpf_hash 0U +#define cfi_bpf_subprog_hash 0U #endif /* CONFIG_CFI_CLANG */ #endif /* _ASM_X86_CFI_H */ diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 811f4a1f0fdf..287065ad3feb 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -876,6 +876,23 @@ asm ( " .size cfi_bpf_hash, 4 \n" " .popsection \n" ); + +/* Must match bpf_callback_t */ +extern u64 __bpf_callback_fn(u64, u64, u64, u64, u64); + +__ADDRESSABLE(__bpf_callback_fn); + +/* u32 __ro_after_init cfi_bpf_subprog_hash = __kcfi_typeid___bpf_callback_fn; */ +asm ( +" .pushsection .data..ro_after_init,\"aw\",@progbits \n" +" .type cfi_bpf_subprog_hash,@object \n" +" .globl cfi_bpf_subprog_hash \n" +" .p2align 2, 0x0 \n" +"cfi_bpf_subprog_hash: \n" +" .long __kcfi_typeid___bpf_callback_fn \n" +" .size cfi_bpf_subprog_hash, 4 \n" +" .popsection \n" +); #endif #ifdef CONFIG_FINEIBT @@ -1191,6 +1208,7 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline, if (builtin) { cfi_seed = get_random_u32(); cfi_bpf_hash = cfi_rehash(cfi_bpf_hash); + cfi_bpf_subprog_hash = cfi_rehash(cfi_bpf_subprog_hash); } ret = cfi_rand_preamble(start_cfi, end_cfi); diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 6798237a5c83..35b04d0d88e5 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -345,12 +345,13 @@ static void pop_callee_regs(u8 **pprog, bool *callee_regs_used) * in arch/x86/kernel/alternative.c */ -static void emit_fineibt(u8 **pprog) +static void emit_fineibt(u8 **pprog, bool is_subprog) { + u32 hash = is_subprog ? cfi_bpf_subprog_hash : cfi_bpf_hash; u8 *prog = *pprog; EMIT_ENDBR(); - EMIT3_off32(0x41, 0x81, 0xea, cfi_bpf_hash); /* subl $hash, %r10d */ + EMIT3_off32(0x41, 0x81, 0xea, hash); /* subl $hash, %r10d */ EMIT2(0x74, 0x07); /* jz.d8 +7 */ EMIT2(0x0f, 0x0b); /* ud2 */ EMIT1(0x90); /* nop */ @@ -359,11 +360,12 @@ static void emit_fineibt(u8 **pprog) *pprog = prog; } -static void emit_kcfi(u8 **pprog) +static void emit_kcfi(u8 **pprog, bool is_subprog) { + u32 hash = is_subprog ? cfi_bpf_subprog_hash : cfi_bpf_hash; u8 *prog = *pprog; - EMIT1_off32(0xb8, cfi_bpf_hash); /* movl $hash, %eax */ + EMIT1_off32(0xb8, hash); /* movl $hash, %eax */ #ifdef CONFIG_CALL_PADDING EMIT1(0x90); EMIT1(0x90); @@ -382,17 +384,17 @@ static void emit_kcfi(u8 **pprog) *pprog = prog; } -static void emit_cfi(u8 **pprog) +static void emit_cfi(u8 **pprog, bool is_subprog) { u8 *prog = *pprog; switch (cfi_mode) { case CFI_FINEIBT: - emit_fineibt(&prog); + emit_fineibt(&prog, is_subprog); break; case CFI_KCFI: - emit_kcfi(&prog); + emit_kcfi(&prog, is_subprog); break; default: @@ -413,7 +415,7 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf, { u8 *prog = *pprog; - emit_cfi(&prog); + emit_cfi(&prog, is_subprog); /* BPF trampoline can be made to work without these nops, * but let's waste 5 bytes for now and optimize later */ -- 2.34.1
From: Peter Zijlstra <peterz@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit 2cd3e3772e41377f32d6eea643e0590774e9187c category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- BPF struct_ops uses __arch_prepare_bpf_trampoline() to write trampolines for indirect function calls. These tramplines much have matching CFI. In order to obtain the correct CFI hash for the various methods, add a matching structure that contains stub functions, the compiler will generate correct CFI which we can pilfer for the trampolines. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20231215092707.566977112@infradead.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Conflicts: include/linux/bpf.h [Fix conflicts because of kabi.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- arch/x86/include/asm/cfi.h | 6 +++ arch/x86/kernel/alternative.c | 22 +++++++++++ arch/x86/net/bpf_jit_comp.c | 66 ++++++++++++++++++++------------ include/linux/bpf.h | 13 +++++++ kernel/bpf/bpf_struct_ops.c | 16 ++++---- net/bpf/bpf_dummy_struct_ops.c | 31 ++++++++++++++- net/ipv4/bpf_tcp_ca.c | 69 ++++++++++++++++++++++++++++++++++ 7 files changed, 191 insertions(+), 32 deletions(-) diff --git a/arch/x86/include/asm/cfi.h b/arch/x86/include/asm/cfi.h index 8779abd217b7..1a50b2cd4713 100644 --- a/arch/x86/include/asm/cfi.h +++ b/arch/x86/include/asm/cfi.h @@ -123,6 +123,8 @@ static inline int cfi_get_offset(void) } #define cfi_get_offset cfi_get_offset +extern u32 cfi_get_func_hash(void *func); + #else static inline enum bug_trap_type handle_cfi_failure(struct pt_regs *regs) { @@ -130,6 +132,10 @@ static inline enum bug_trap_type handle_cfi_failure(struct pt_regs *regs) } #define cfi_bpf_hash 0U #define cfi_bpf_subprog_hash 0U +static inline u32 cfi_get_func_hash(void *func) +{ + return 0; +} #endif /* CONFIG_CFI_CLANG */ #endif /* _ASM_X86_CFI_H */ diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 287065ad3feb..183d42302243 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -893,6 +893,28 @@ asm ( " .size cfi_bpf_subprog_hash, 4 \n" " .popsection \n" ); + +u32 cfi_get_func_hash(void *func) +{ + u32 hash; + + func -= cfi_get_offset(); + switch (cfi_mode) { + case CFI_FINEIBT: + func += 7; + break; + case CFI_KCFI: + func += 1; + break; + default: + return 0; + } + + if (get_kernel_nofault(hash, func)) + return 0; + + return hash; +} #endif #ifdef CONFIG_FINEIBT diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 35b04d0d88e5..5c020dcec31b 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -345,9 +345,8 @@ static void pop_callee_regs(u8 **pprog, bool *callee_regs_used) * in arch/x86/kernel/alternative.c */ -static void emit_fineibt(u8 **pprog, bool is_subprog) +static void emit_fineibt(u8 **pprog, u32 hash) { - u32 hash = is_subprog ? cfi_bpf_subprog_hash : cfi_bpf_hash; u8 *prog = *pprog; EMIT_ENDBR(); @@ -360,9 +359,8 @@ static void emit_fineibt(u8 **pprog, bool is_subprog) *pprog = prog; } -static void emit_kcfi(u8 **pprog, bool is_subprog) +static void emit_kcfi(u8 **pprog, u32 hash) { - u32 hash = is_subprog ? cfi_bpf_subprog_hash : cfi_bpf_hash; u8 *prog = *pprog; EMIT1_off32(0xb8, hash); /* movl $hash, %eax */ @@ -384,17 +382,17 @@ static void emit_kcfi(u8 **pprog, bool is_subprog) *pprog = prog; } -static void emit_cfi(u8 **pprog, bool is_subprog) +static void emit_cfi(u8 **pprog, u32 hash) { u8 *prog = *pprog; switch (cfi_mode) { case CFI_FINEIBT: - emit_fineibt(&prog, is_subprog); + emit_fineibt(&prog, hash); break; case CFI_KCFI: - emit_kcfi(&prog, is_subprog); + emit_kcfi(&prog, hash); break; default: @@ -415,7 +413,7 @@ static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf, { u8 *prog = *pprog; - emit_cfi(&prog, is_subprog); + emit_cfi(&prog, is_subprog ? cfi_bpf_subprog_hash : cfi_bpf_hash); /* BPF trampoline can be made to work without these nops, * but let's waste 5 bytes for now and optimize later */ @@ -2536,10 +2534,19 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im u8 *prog; bool save_ret; + /* + * F_INDIRECT is only compatible with F_RET_FENTRY_RET, it is + * explicitly incompatible with F_CALL_ORIG | F_SKIP_FRAME | F_IP_ARG + * because @func_addr. + */ + WARN_ON_ONCE((flags & BPF_TRAMP_F_INDIRECT) && + (flags & ~(BPF_TRAMP_F_INDIRECT | BPF_TRAMP_F_RET_FENTRY_RET))); + /* extra registers for struct arguments */ - for (i = 0; i < m->nr_args; i++) + for (i = 0; i < m->nr_args; i++) { if (m->arg_flags[i] & BTF_FMODEL_STRUCT_ARG) nr_regs += (m->arg_size[i] + 7) / 8 - 1; + } /* x86-64 supports up to MAX_BPF_FUNC_ARGS arguments. 1-6 * are passed through regs, the remains are through stack. @@ -2622,20 +2629,27 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im prog = rw_image; - EMIT_ENDBR(); - /* - * This is the direct-call trampoline, as such it needs accounting - * for the __fentry__ call. - */ - x86_call_depth_emit_accounting(&prog, NULL); + if (flags & BPF_TRAMP_F_INDIRECT) { + /* + * Indirect call for bpf_struct_ops + */ + emit_cfi(&prog, cfi_get_func_hash(func_addr)); + } else { + /* + * Direct-call fentry stub, as such it needs accounting for the + * __fentry__ call. + */ + x86_call_depth_emit_accounting(&prog, NULL); + } EMIT1(0x55); /* push rbp */ EMIT3(0x48, 0x89, 0xE5); /* mov rbp, rsp */ - if (!is_imm8(stack_size)) + if (!is_imm8(stack_size)) { /* sub rsp, stack_size */ EMIT3_off32(0x48, 0x81, 0xEC, stack_size); - else + } else { /* sub rsp, stack_size */ EMIT4(0x48, 0x83, 0xEC, stack_size); + } if (flags & BPF_TRAMP_F_TAIL_CALL_CTX) EMIT1(0x50); /* push rax */ /* mov QWORD PTR [rbp - rbx_off], rbx */ @@ -2669,10 +2683,11 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im } } - if (fentry->nr_links) + if (fentry->nr_links) { if (invoke_bpf(m, &prog, fentry, regs_off, run_ctx_off, flags & BPF_TRAMP_F_RET_FENTRY_RET, image, rw_image)) return -EINVAL; + } if (fmod_ret->nr_links) { branches = kcalloc(fmod_ret->nr_links, sizeof(u8 *), @@ -2691,11 +2706,12 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im restore_regs(m, &prog, regs_off); save_args(m, &prog, arg_stack_off, true); - if (flags & BPF_TRAMP_F_TAIL_CALL_CTX) + if (flags & BPF_TRAMP_F_TAIL_CALL_CTX) { /* Before calling the original function, restore the * tail_call_cnt from stack to rax. */ RESTORE_TAIL_CALL_CNT(stack_size); + } if (flags & BPF_TRAMP_F_ORIG_STACK) { emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, 8); @@ -2724,17 +2740,19 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im /* Update the branches saved in invoke_bpf_mod_ret with the * aligned address of do_fexit. */ - for (i = 0; i < fmod_ret->nr_links; i++) + for (i = 0; i < fmod_ret->nr_links; i++) { emit_cond_near_jump(&branches[i], image + (prog - (u8 *)rw_image), image + (branches[i] - (u8 *)rw_image), X86_JNE); + } } - if (fexit->nr_links) + if (fexit->nr_links) { if (invoke_bpf(m, &prog, fexit, regs_off, run_ctx_off, false, image, rw_image)) { ret = -EINVAL; goto cleanup; } + } if (flags & BPF_TRAMP_F_RESTORE_REGS) restore_regs(m, &prog, regs_off); @@ -2751,11 +2769,12 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im ret = -EINVAL; goto cleanup; } - } else if (flags & BPF_TRAMP_F_TAIL_CALL_CTX) + } else if (flags & BPF_TRAMP_F_TAIL_CALL_CTX) { /* Before running the original function, restore the * tail_call_cnt from stack to rax. */ RESTORE_TAIL_CALL_CNT(stack_size); + } /* restore return value of orig_call or fentry prog back into RAX */ if (save_ret) @@ -2763,9 +2782,10 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *rw_im emit_ldx(&prog, BPF_DW, BPF_REG_6, BPF_REG_FP, -rbx_off); EMIT1(0xC9); /* leave */ - if (flags & BPF_TRAMP_F_SKIP_FRAME) + if (flags & BPF_TRAMP_F_SKIP_FRAME) { /* skip our return address and return to parent */ EMIT4(0x48, 0x83, 0xC4, 8); /* add rsp, 8 */ + } emit_return(&prog, image + (prog - (u8 *)rw_image)); /* Make sure the trampoline generation logic doesn't overflow */ if (WARN_ON_ONCE(prog > (u8 *)rw_image_end - BPF_INSN_SAFETY)) { diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 2839780ba1b5..3bb9b329340f 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1113,6 +1113,17 @@ struct btf_func_model { */ #define BPF_TRAMP_F_TAIL_CALL_CTX BIT(7) +/* + * Indicate the trampoline should be suitable to receive indirect calls; + * without this indirectly calling the generated code can result in #UD/#CP, + * depending on the CFI options. + * + * Used by bpf_struct_ops. + * + * Incompatible with FENTRY usage, overloads @func_addr argument. + */ +#define BPF_TRAMP_F_INDIRECT BIT(8) + /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50 * bytes on x86. */ @@ -1787,6 +1798,7 @@ struct bpf_struct_ops { struct btf_func_model func_models[BPF_STRUCT_OPS_MAX_NR_MEMBERS]; u32 type_id; u32 value_id; + void *cfi_stubs; KABI_RESERVE(1) KABI_RESERVE(2) @@ -1805,6 +1817,7 @@ int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key, int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks, struct bpf_tramp_link *link, const struct btf_func_model *model, + void *stub_func, void *image, void *image_end); static inline bool bpf_try_module_get(const void *data, struct module *owner) { diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 4d53c53fc5aa..02068bd0e4d9 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -352,17 +352,16 @@ const struct bpf_link_ops bpf_struct_ops_link_lops = { int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks, struct bpf_tramp_link *link, const struct btf_func_model *model, - void *image, void *image_end) + void *stub_func, void *image, void *image_end) { - u32 flags; + u32 flags = BPF_TRAMP_F_INDIRECT; int size; tlinks[BPF_TRAMP_FENTRY].links[0] = link; tlinks[BPF_TRAMP_FENTRY].nr_links = 1; - /* BPF_TRAMP_F_RET_FENTRY_RET is only used by bpf_struct_ops, - * and it must be used alone. - */ - flags = model->ret_size > 0 ? BPF_TRAMP_F_RET_FENTRY_RET : 0; + + if (model->ret_size > 0) + flags |= BPF_TRAMP_F_RET_FENTRY_RET; size = arch_bpf_trampoline_size(model, flags, tlinks, NULL); if (size < 0) @@ -370,7 +369,7 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks, if (size > (unsigned long)image_end - (unsigned long)image) return -E2BIG; return arch_prepare_bpf_trampoline(NULL, image, image_end, - model, flags, tlinks, NULL); + model, flags, tlinks, stub_func); } static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, @@ -504,11 +503,12 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, err = bpf_struct_ops_prepare_trampoline(tlinks, link, &st_ops->func_models[i], + *(void **)(st_ops->cfi_stubs + moff), image, image_end); if (err < 0) goto reset_unlock; - *(void **)(kdata + moff) = image; + *(void **)(kdata + moff) = image + cfi_get_offset(); image += err; /* put prog_id to udata */ diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c index 2748f9d77b18..8906f7bdf4a9 100644 --- a/net/bpf/bpf_dummy_struct_ops.c +++ b/net/bpf/bpf_dummy_struct_ops.c @@ -12,6 +12,11 @@ extern struct bpf_struct_ops bpf_bpf_dummy_ops; /* A common type for test_N with return value in bpf_dummy_ops */ typedef int (*dummy_ops_test_ret_fn)(struct bpf_dummy_ops_state *state, ...); +static int dummy_ops_test_ret_function(struct bpf_dummy_ops_state *state, ...) +{ + return 0; +} + struct bpf_dummy_ops_test_args { u64 args[MAX_BPF_FUNC_ARGS]; struct bpf_dummy_ops_state state; @@ -62,7 +67,7 @@ static int dummy_ops_copy_args(struct bpf_dummy_ops_test_args *args) static int dummy_ops_call_op(void *image, struct bpf_dummy_ops_test_args *args) { - dummy_ops_test_ret_fn test = (void *)image; + dummy_ops_test_ret_fn test = (void *)image + cfi_get_offset(); struct bpf_dummy_ops_state *state = NULL; /* state needs to be NULL if args[0] is 0 */ @@ -119,6 +124,7 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, op_idx = prog->expected_attach_type; err = bpf_struct_ops_prepare_trampoline(tlinks, link, &st_ops->func_models[op_idx], + &dummy_ops_test_ret_function, image, image + PAGE_SIZE); if (err < 0) goto out; @@ -219,6 +225,28 @@ static void bpf_dummy_unreg(void *kdata) { } +static int bpf_dummy_test_1(struct bpf_dummy_ops_state *cb) +{ + return 0; +} + +static int bpf_dummy_test_2(struct bpf_dummy_ops_state *cb, int a1, unsigned short a2, + char a3, unsigned long a4) +{ + return 0; +} + +static int bpf_dummy_test_sleepable(struct bpf_dummy_ops_state *cb) +{ + return 0; +} + +static struct bpf_dummy_ops __bpf_bpf_dummy_ops = { + .test_1 = bpf_dummy_test_1, + .test_2 = bpf_dummy_test_2, + .test_sleepable = bpf_dummy_test_sleepable, +}; + struct bpf_struct_ops bpf_bpf_dummy_ops = { .verifier_ops = &bpf_dummy_verifier_ops, .init = bpf_dummy_init, @@ -227,4 +255,5 @@ struct bpf_struct_ops bpf_bpf_dummy_ops = { .reg = bpf_dummy_reg, .unreg = bpf_dummy_unreg, .name = "bpf_dummy_ops", + .cfi_stubs = &__bpf_bpf_dummy_ops, }; diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c index 39dcccf0f174..ae8b15e6896f 100644 --- a/net/ipv4/bpf_tcp_ca.c +++ b/net/ipv4/bpf_tcp_ca.c @@ -271,6 +271,74 @@ static int bpf_tcp_ca_validate(void *kdata) return tcp_validate_congestion_control(kdata); } +static u32 bpf_tcp_ca_ssthresh(struct sock *sk) +{ + return 0; +} + +static void bpf_tcp_ca_cong_avoid(struct sock *sk, u32 ack, u32 acked) +{ +} + +static void bpf_tcp_ca_set_state(struct sock *sk, u8 new_state) +{ +} + +static void bpf_tcp_ca_cwnd_event(struct sock *sk, enum tcp_ca_event ev) +{ +} + +static void bpf_tcp_ca_in_ack_event(struct sock *sk, u32 flags) +{ +} + +static void bpf_tcp_ca_pkts_acked(struct sock *sk, const struct ack_sample *sample) +{ +} + +static u32 bpf_tcp_ca_min_tso_segs(struct sock *sk) +{ + return 0; +} + +static void bpf_tcp_ca_cong_control(struct sock *sk, const struct rate_sample *rs) +{ +} + +static u32 bpf_tcp_ca_undo_cwnd(struct sock *sk) +{ + return 0; +} + +static u32 bpf_tcp_ca_sndbuf_expand(struct sock *sk) +{ + return 0; +} + +static void __bpf_tcp_ca_init(struct sock *sk) +{ +} + +static void __bpf_tcp_ca_release(struct sock *sk) +{ +} + +static struct tcp_congestion_ops __bpf_ops_tcp_congestion_ops = { + .ssthresh = bpf_tcp_ca_ssthresh, + .cong_avoid = bpf_tcp_ca_cong_avoid, + .set_state = bpf_tcp_ca_set_state, + .cwnd_event = bpf_tcp_ca_cwnd_event, + .in_ack_event = bpf_tcp_ca_in_ack_event, + .pkts_acked = bpf_tcp_ca_pkts_acked, + .min_tso_segs = bpf_tcp_ca_min_tso_segs, + .cong_control = bpf_tcp_ca_cong_control, + .undo_cwnd = bpf_tcp_ca_undo_cwnd, + .sndbuf_expand = bpf_tcp_ca_sndbuf_expand, + + .init = __bpf_tcp_ca_init, + .release = __bpf_tcp_ca_release, +}; + struct bpf_struct_ops bpf_tcp_congestion_ops = { .verifier_ops = &bpf_tcp_ca_verifier_ops, .reg = bpf_tcp_ca_reg, @@ -281,6 +349,7 @@ struct bpf_struct_ops bpf_tcp_congestion_ops = { .init = bpf_tcp_ca_init, .validate = bpf_tcp_ca_validate, .name = "tcp_congestion_ops", + .cfi_stubs = &__bpf_ops_tcp_congestion_ops, }; static int __init bpf_tcp_ca_kfunc_init(void) -- 2.34.1
From: Peter Zijlstra <peterz@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit e9d13b9d2f99ccf7afeab490d97eaa5ac9846598 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Add a CFI_NOSEAL() helper to mark functions that need to retain their CFI information, despite not otherwise leaking their address. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20231215092707.669401084@infradead.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- arch/x86/include/asm/cfi.h | 5 +++++ include/linux/cfi.h | 4 ++++ 2 files changed, 9 insertions(+) diff --git a/arch/x86/include/asm/cfi.h b/arch/x86/include/asm/cfi.h index 1a50b2cd4713..7cd752557905 100644 --- a/arch/x86/include/asm/cfi.h +++ b/arch/x86/include/asm/cfi.h @@ -8,6 +8,7 @@ * Copyright (C) 2022 Google LLC */ #include <linux/bug.h> +#include <asm/ibt.h> /* * An overview of the various calling conventions... @@ -138,4 +139,8 @@ static inline u32 cfi_get_func_hash(void *func) } #endif /* CONFIG_CFI_CLANG */ +#if HAS_KERNEL_IBT == 1 +#define CFI_NOSEAL(x) asm(IBT_NOSEAL(__stringify(x))) +#endif + #endif /* _ASM_X86_CFI_H */ diff --git a/include/linux/cfi.h b/include/linux/cfi.h index 1ed2d96c0cfc..f0df518e11dd 100644 --- a/include/linux/cfi.h +++ b/include/linux/cfi.h @@ -46,4 +46,8 @@ static inline void module_cfi_finalize(const Elf_Ehdr *hdr, #endif /* CONFIG_ARCH_USES_CFI_TRAPS */ #endif /* CONFIG_MODULES */ +#ifndef CFI_NOSEAL +#define CFI_NOSEAL(x) +#endif + #endif /* _LINUX_CFI_H */ -- 2.34.1
From: Peter Zijlstra <peterz@infradead.org> mainline inclusion from mainline-v6.8-rc1 commit e4c00339891c074c76f626ac82981963cbba5332 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Ensure the various dtor functions match their prototype and retain their CFI signatures, since they don't have their address taken, they are prone to not getting CFI, making them impossible to call indirectly. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20231215092707.799451071@infradead.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/cpumask.c | 8 +++++++- kernel/bpf/helpers.c | 16 ++++++++++++++-- net/bpf/test_run.c | 15 +++++++++++++-- 3 files changed, 34 insertions(+), 5 deletions(-) diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c index e01c741e54e7..6acecc8ebd61 100644 --- a/kernel/bpf/cpumask.c +++ b/kernel/bpf/cpumask.c @@ -96,6 +96,12 @@ __bpf_kfunc void bpf_cpumask_release(struct bpf_cpumask *cpumask) migrate_enable(); } +__bpf_kfunc void bpf_cpumask_release_dtor(void *cpumask) +{ + bpf_cpumask_release(cpumask); +} +CFI_NOSEAL(bpf_cpumask_release_dtor); + /** * bpf_cpumask_first() - Get the index of the first nonzero bit in the cpumask. * @cpumask: The cpumask being queried. @@ -441,7 +447,7 @@ static const struct btf_kfunc_id_set cpumask_kfunc_set = { BTF_ID_LIST(cpumask_dtor_ids) BTF_ID(struct, bpf_cpumask) -BTF_ID(func, bpf_cpumask_release) +BTF_ID(func, bpf_cpumask_release_dtor) static int __init cpumask_kfunc_init(void) { diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index a17fee069161..0edec71be0a7 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2270,6 +2270,12 @@ __bpf_kfunc void bpf_task_release(struct task_struct *p) put_task_struct_rcu_user(p); } +__bpf_kfunc void bpf_task_release_dtor(void *p) +{ + put_task_struct_rcu_user(p); +} +CFI_NOSEAL(bpf_task_release_dtor); + #ifdef CONFIG_CGROUPS /** * bpf_cgroup_acquire - Acquire a reference to a cgroup. A cgroup acquired by @@ -2294,6 +2300,12 @@ __bpf_kfunc void bpf_cgroup_release(struct cgroup *cgrp) cgroup_put(cgrp); } +__bpf_kfunc void bpf_cgroup_release_dtor(void *cgrp) +{ + cgroup_put(cgrp); +} +CFI_NOSEAL(bpf_cgroup_release_dtor); + /** * bpf_cgroup_ancestor - Perform a lookup on an entry in a cgroup's ancestor * array. A cgroup returned by this kfunc which is not subsequently stored in a @@ -2653,10 +2665,10 @@ static const struct btf_kfunc_id_set generic_kfunc_set = { BTF_ID_LIST(generic_dtor_ids) BTF_ID(struct, task_struct) -BTF_ID(func, bpf_task_release) +BTF_ID(func, bpf_task_release_dtor) #ifdef CONFIG_CGROUPS BTF_ID(struct, cgroup) -BTF_ID(func, bpf_cgroup_release) +BTF_ID(func, bpf_cgroup_release_dtor) #endif BTF_SET8_START(common_btf_ids) diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 6ccb767169fc..63a875c638b6 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -601,10 +601,21 @@ __bpf_kfunc void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) refcount_dec(&p->cnt); } +__bpf_kfunc void bpf_kfunc_call_test_release_dtor(void *p) +{ + bpf_kfunc_call_test_release(p); +} +CFI_NOSEAL(bpf_kfunc_call_test_release_dtor); + __bpf_kfunc void bpf_kfunc_call_memb_release(struct prog_test_member *p) { } +__bpf_kfunc void bpf_kfunc_call_memb_release_dtor(void *p) +{ +} +CFI_NOSEAL(bpf_kfunc_call_memb_release_dtor); + __bpf_kfunc_end_defs(); BTF_SET8_START(bpf_test_modify_return_ids) @@ -1698,9 +1709,9 @@ static const struct btf_kfunc_id_set bpf_prog_test_kfunc_set = { BTF_ID_LIST(bpf_prog_test_dtor_kfunc_ids) BTF_ID(struct, prog_test_ref_kfunc) -BTF_ID(func, bpf_kfunc_call_test_release) +BTF_ID(func, bpf_kfunc_call_test_release_dtor) BTF_ID(struct, prog_test_member) -BTF_ID(func, bpf_kfunc_call_memb_release) +BTF_ID(func, bpf_kfunc_call_memb_release_dtor) static int __init bpf_prog_test_run_init(void) { -- 2.34.1
From: Alexei Starovoitov <ast@kernel.org> mainline inclusion from mainline-v6.8-rc1 commit 0c970ed2f87c058fe3ddeb4d7d8f64f72cf41d7a category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- The func_addr used to be NULL for indirect trampolines used by struct_ops. Now func_addr is a valid function pointer. Hence use BPF_TRAMP_F_INDIRECT flag to detect such condition. Fixes: 2cd3e3772e41 ("x86/cfi,bpf: Fix bpf_struct_ops CFI") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/bpf/20231216004549.78355-1-alexei.starovoitov@gmail.... Conflicts: tools/testing/selftests/bpf/DENYLIST.s390x [Fix conflict because context diff.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- arch/s390/net/bpf_jit_comp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index 1f22da0ec420..28df56b06f5a 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -2258,7 +2258,8 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, return -ENOTSUPP; /* Return to %r14, since func_addr and %r0 are not available. */ - if (!func_addr && !(flags & BPF_TRAMP_F_ORIG_STACK)) + if ((!func_addr && !(flags & BPF_TRAMP_F_ORIG_STACK)) || + (flags & BPF_TRAMP_F_INDIRECT)) flags |= BPF_TRAMP_F_SKIP_FRAME; /* -- 2.34.1
From: Alexei Starovoitov <ast@kernel.org> mainline inclusion from mainline-v6.8-rc1 commit a8b242d77bd72556b7a9d8be779f7d27b95ba73c category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Compilers optimize conditional operators at will, but often bpf programmers want to force compilers to keep the same operator in asm as it's written in C. Introduce bpf_cmp_likely/unlikely(var1, conditional_op, var2) macros that can be used as: - if (seen >= 1000) + if (bpf_cmp_unlikely(seen, >=, 1000)) The macros take advantage of BPF assembly that is C like. The macros check the sign of variable 'seen' and emits either signed or unsigned compare. For example: int a; bpf_cmp_unlikely(a, >, 0) will be translated to 'if rX s> 0 goto' in BPF assembly. unsigned int a; bpf_cmp_unlikely(a, >, 0) will be translated to 'if rX > 0 goto' in BPF assembly. C type conversions coupled with comparison operator are tricky. int i = -1; unsigned int j = 1; if (i < j) // this is false. long i = -1; unsigned int j = 1; if (i < j) // this is true. Make sure BPF program is compiled with -Wsign-compare then the macros will catch the mistake. The macros check LHS (left hand side) only to figure out the sign of compare. 'if 0 < rX goto' is not allowed in the assembly, so the users have to use a variable on LHS anyway. The patch updates few tests to demonstrate the use of the macros. The macro allows to use BPF_JSET in C code, since LLVM doesn't generate it at present. For example: if (i & j) compiles into r0 &= r1; if r0 == 0 goto while if (bpf_cmp_unlikely(i, &, j)) compiles into if r0 & r1 goto Note that the macros has to be careful with RHS assembly predicate. Since: u64 __rhs = 1ull << 42; asm goto("if r0 < %[rhs] goto +1" :: [rhs] "ri" (__rhs)); LLVM will silently truncate 64-bit constant into s32 imm. Note that [lhs] "r"((short)LHS) the type cast is a workaround for LLVM issue. When LHS is exactly 32-bit LLVM emits redundant <<=32, >>=32 to zero upper 32-bits. When LHS is 64 or 16 or 8-bit variable there are no shifts. When LHS is 32-bit the (u64) cast doesn't help. Hence use (short) cast. It does _not_ truncate the variable before it's assigned to a register. Traditional likely()/unlikely() macros that use __builtin_expect(!!(x), 1 or 0) have no effect on these macros, hence macros implement the logic manually. bpf_cmp_unlikely() macro preserves compare operator as-is while bpf_cmp_likely() macro flips the compare. Consider two cases: A. for() { if (foo >= 10) { bar += foo; } other code; } B. for() { if (foo >= 10) break; other code; } It's ok to use either bpf_cmp_likely or bpf_cmp_unlikely macros in both cases, but consider that 'break' is effectively 'goto out_of_the_loop'. Hence it's better to use bpf_cmp_unlikely in the B case. While 'bar += foo' is better to keep as 'fallthrough' == likely code path in the A case. When it's written as: A. for() { if (bpf_cmp_likely(foo, >=, 10)) { bar += foo; } other code; } B. for() { if (bpf_cmp_unlikely(foo, >=, 10)) break; other code; } The assembly will look like: A. for() { if r1 < 10 goto L1; bar += foo; L1: other code; } B. for() { if r1 >= 10 goto L2; other code; } L2: The bpf_cmp_likely vs bpf_cmp_unlikely changes basic block layout, hence it will greatly influence the verification process. The number of processed instructions will be different, since the verifier walks the fallthrough first. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/20231226191148.48536-3-alexei.starovoitov@gmail.... Conflicts: tools/testing/selftests/bpf/prog_tests/exceptions.c tools/testing/selftests/bpf/bpf_experimental.h tools/testing/selftests/bpf/progs/iters_task_vma.c [exceptions.c is no exist, just ignore it. And fix other simple conflicts.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../testing/selftests/bpf/bpf_experimental.h | 69 +++++++++++++++++++ .../selftests/bpf/progs/iters_task_vma.c | 2 +- 2 files changed, 70 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h index facf41fec72f..17127c2f928a 100644 --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -189,4 +189,73 @@ extern int bpf_iter_css_new(struct bpf_iter_css *it, extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym; extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym; +#define __cmp_cannot_be_signed(x) \ + __builtin_strcmp(#x, "==") == 0 || __builtin_strcmp(#x, "!=") == 0 || \ + __builtin_strcmp(#x, "&") == 0 + +#define __is_signed_type(type) (((type)(-1)) < (type)1) + +#define __bpf_cmp(LHS, OP, SIGN, PRED, RHS, DEFAULT) \ + ({ \ + __label__ l_true; \ + bool ret = DEFAULT; \ + asm volatile goto("if %[lhs] " SIGN #OP " %[rhs] goto %l[l_true]" \ + :: [lhs] "r"((short)LHS), [rhs] PRED (RHS) :: l_true); \ + ret = !DEFAULT; \ +l_true: \ + ret; \ + }) + +/* C type conversions coupled with comparison operator are tricky. + * Make sure BPF program is compiled with -Wsign-compare then + * __lhs OP __rhs below will catch the mistake. + * Be aware that we check only __lhs to figure out the sign of compare. + */ +#define _bpf_cmp(LHS, OP, RHS, NOFLIP) \ + ({ \ + typeof(LHS) __lhs = (LHS); \ + typeof(RHS) __rhs = (RHS); \ + bool ret; \ + _Static_assert(sizeof(&(LHS)), "1st argument must be an lvalue expression"); \ + (void)(__lhs OP __rhs); \ + if (__cmp_cannot_be_signed(OP) || !__is_signed_type(typeof(__lhs))) { \ + if (sizeof(__rhs) == 8) \ + ret = __bpf_cmp(__lhs, OP, "", "r", __rhs, NOFLIP); \ + else \ + ret = __bpf_cmp(__lhs, OP, "", "i", __rhs, NOFLIP); \ + } else { \ + if (sizeof(__rhs) == 8) \ + ret = __bpf_cmp(__lhs, OP, "s", "r", __rhs, NOFLIP); \ + else \ + ret = __bpf_cmp(__lhs, OP, "s", "i", __rhs, NOFLIP); \ + } \ + ret; \ + }) + +#ifndef bpf_cmp_unlikely +#define bpf_cmp_unlikely(LHS, OP, RHS) _bpf_cmp(LHS, OP, RHS, true) +#endif + +#ifndef bpf_cmp_likely +#define bpf_cmp_likely(LHS, OP, RHS) \ + ({ \ + bool ret; \ + if (__builtin_strcmp(#OP, "==") == 0) \ + ret = _bpf_cmp(LHS, !=, RHS, false); \ + else if (__builtin_strcmp(#OP, "!=") == 0) \ + ret = _bpf_cmp(LHS, ==, RHS, false); \ + else if (__builtin_strcmp(#OP, "<=") == 0) \ + ret = _bpf_cmp(LHS, >, RHS, false); \ + else if (__builtin_strcmp(#OP, "<") == 0) \ + ret = _bpf_cmp(LHS, >=, RHS, false); \ + else if (__builtin_strcmp(#OP, ">") == 0) \ + ret = _bpf_cmp(LHS, <=, RHS, false); \ + else if (__builtin_strcmp(#OP, ">=") == 0) \ + ret = _bpf_cmp(LHS, <, RHS, false); \ + else \ + (void) "bug"; \ + ret; \ + }) +#endif + #endif diff --git a/tools/testing/selftests/bpf/progs/iters_task_vma.c b/tools/testing/selftests/bpf/progs/iters_task_vma.c index 44edecfdfaee..dc0c3691dcc2 100644 --- a/tools/testing/selftests/bpf/progs/iters_task_vma.c +++ b/tools/testing/selftests/bpf/progs/iters_task_vma.c @@ -28,7 +28,7 @@ int iter_task_vma_for_each(const void *ctx) return 0; bpf_for_each(task_vma, vma, task, 0) { - if (seen >= 1000) + if (bpf_cmp_unlikely(seen, >=, 1000)) break; vm_ranges[seen].vm_start = vma->vm_start; -- 2.34.1
From: Pu Lehui <pulehui@huawei.com> mainline inclusion from mainline-v6.8-rc2 commit 1732ebc4a26181c8f116c7639db99754b313edc8 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- We encountered a kernel crash triggered by the bpf_tcp_ca testcase as show below: Unable to handle kernel paging request at virtual address ff60000088554500 Oops [#1] ... CPU: 3 PID: 458 Comm: test_progs Tainted: G OE 6.8.0-rc1-kselftest_plain #1 Hardware name: riscv-virtio,qemu (DT) epc : 0xff60000088554500 ra : tcp_ack+0x288/0x1232 epc : ff60000088554500 ra : ffffffff80cc7166 sp : ff2000000117ba50 gp : ffffffff82587b60 tp : ff60000087be0040 t0 : ff60000088554500 t1 : ffffffff801ed24e t2 : 0000000000000000 s0 : ff2000000117bbc0 s1 : 0000000000000500 a0 : ff20000000691000 a1 : 0000000000000018 a2 : 0000000000000001 a3 : ff60000087be03a0 a4 : 0000000000000000 a5 : 0000000000000000 a6 : 0000000000000021 a7 : ffffffff8263f880 s2 : 000000004ac3c13b s3 : 000000004ac3c13a s4 : 0000000000008200 s5 : 0000000000000001 s6 : 0000000000000104 s7 : ff2000000117bb00 s8 : ff600000885544c0 s9 : 0000000000000000 s10: ff60000086ff0b80 s11: 000055557983a9c0 t3 : 0000000000000000 t4 : 000000000000ffc4 t5 : ffffffff8154f170 t6 : 0000000000000030 status: 0000000200000120 badaddr: ff60000088554500 cause: 000000000000000c Code: c796 67d7 0000 0000 0052 0002 c13b 4ac3 0000 0000 (0001) 0000 ---[ end trace 0000000000000000 ]--- The reason is that commit 2cd3e3772e41 ("x86/cfi,bpf: Fix bpf_struct_ops CFI") changes the func_addr of arch_prepare_bpf_trampoline in struct_ops from NULL to non-NULL, while we use func_addr on RV64 to differentiate between struct_ops and regular trampoline. When the struct_ops testcase is triggered, it emits wrong prologue and epilogue, and lead to unpredictable issues. After commit 2cd3e3772e41, we can use BPF_TRAMP_F_INDIRECT to distinguish them as it always be set in struct_ops. Fixes: 2cd3e3772e41 ("x86/cfi,bpf: Fix bpf_struct_ops CFI") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Björn Töpel <bjorn@rivosinc.com> Acked-by: Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/bpf/20240123023207.1917284-1-pulehui@huaweicloud.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- arch/riscv/net/bpf_jit_comp64.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c index 4a1a61ae14d6..4ae7e6e7c061 100644 --- a/arch/riscv/net/bpf_jit_comp64.c +++ b/arch/riscv/net/bpf_jit_comp64.c @@ -796,6 +796,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, struct bpf_tramp_links *fentry = &tlinks[BPF_TRAMP_FENTRY]; struct bpf_tramp_links *fexit = &tlinks[BPF_TRAMP_FEXIT]; struct bpf_tramp_links *fmod_ret = &tlinks[BPF_TRAMP_MODIFY_RETURN]; + bool is_struct_ops = flags & BPF_TRAMP_F_INDIRECT; void *orig_call = func_addr; bool save_ret; u32 insn; @@ -878,7 +879,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, stack_size = round_up(stack_size, 16); - if (func_addr) { + if (!is_struct_ops) { /* For the trampoline called from function entry, * the frame of traced function and the frame of * trampoline need to be considered. @@ -998,7 +999,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, emit_ld(RV_REG_S1, -sreg_off, RV_REG_FP, ctx); - if (func_addr) { + if (!is_struct_ops) { /* trampoline called from function entry */ emit_ld(RV_REG_T0, stack_size - 8, RV_REG_SP, ctx); emit_ld(RV_REG_FP, stack_size - 16, RV_REG_SP, ctx); -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 3b1f89e747cd4b24244f2798a35d28815b744303 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Move the majority of the code to bpf_struct_ops_init_one(), which can then be utilized for the initialization of newly registered dynamically allocated struct_ops types in the following patches. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-2-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/btf.h | 1 + kernel/bpf/bpf_struct_ops.c | 157 +++++++++++++++++++----------------- kernel/bpf/btf.c | 5 ++ 3 files changed, 89 insertions(+), 74 deletions(-) diff --git a/include/linux/btf.h b/include/linux/btf.h index 59d404e22814..1d852dad7473 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -137,6 +137,7 @@ struct btf_struct_metas { extern const struct file_operations btf_fops; +const char *btf_get_name(const struct btf *btf); void btf_get(struct btf *btf); void btf_put(struct btf *btf); int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_sz); diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 02068bd0e4d9..96cba76f4ac3 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -110,102 +110,111 @@ const struct bpf_prog_ops bpf_struct_ops_prog_ops = { static const struct btf_type *module_type; -void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log) +static void bpf_struct_ops_init_one(struct bpf_struct_ops *st_ops, + struct btf *btf, + struct bpf_verifier_log *log) { - s32 type_id, value_id, module_id; const struct btf_member *member; - struct bpf_struct_ops *st_ops; const struct btf_type *t; + s32 type_id, value_id; char value_name[128]; const char *mname; - u32 i, j; + int i; - /* Ensure BTF type is emitted for "struct bpf_struct_ops_##_name" */ -#define BPF_STRUCT_OPS_TYPE(_name) BTF_TYPE_EMIT(struct bpf_struct_ops_##_name); -#include "bpf_struct_ops_types.h" -#undef BPF_STRUCT_OPS_TYPE + if (strlen(st_ops->name) + VALUE_PREFIX_LEN >= + sizeof(value_name)) { + pr_warn("struct_ops name %s is too long\n", + st_ops->name); + return; + } + sprintf(value_name, "%s%s", VALUE_PREFIX, st_ops->name); - module_id = btf_find_by_name_kind(btf, "module", BTF_KIND_STRUCT); - if (module_id < 0) { - pr_warn("Cannot find struct module in btf_vmlinux\n"); + value_id = btf_find_by_name_kind(btf, value_name, + BTF_KIND_STRUCT); + if (value_id < 0) { + pr_warn("Cannot find struct %s in %s\n", + value_name, btf_get_name(btf)); return; } - module_type = btf_type_by_id(btf, module_id); - for (i = 0; i < ARRAY_SIZE(bpf_struct_ops); i++) { - st_ops = bpf_struct_ops[i]; + type_id = btf_find_by_name_kind(btf, st_ops->name, + BTF_KIND_STRUCT); + if (type_id < 0) { + pr_warn("Cannot find struct %s in %s\n", + st_ops->name, btf_get_name(btf)); + return; + } + t = btf_type_by_id(btf, type_id); + if (btf_type_vlen(t) > BPF_STRUCT_OPS_MAX_NR_MEMBERS) { + pr_warn("Cannot support #%u members in struct %s\n", + btf_type_vlen(t), st_ops->name); + return; + } - if (strlen(st_ops->name) + VALUE_PREFIX_LEN >= - sizeof(value_name)) { - pr_warn("struct_ops name %s is too long\n", + for_each_member(i, t, member) { + const struct btf_type *func_proto; + + mname = btf_name_by_offset(btf, member->name_off); + if (!*mname) { + pr_warn("anon member in struct %s is not supported\n", st_ops->name); - continue; + break; } - sprintf(value_name, "%s%s", VALUE_PREFIX, st_ops->name); - value_id = btf_find_by_name_kind(btf, value_name, - BTF_KIND_STRUCT); - if (value_id < 0) { - pr_warn("Cannot find struct %s in btf_vmlinux\n", - value_name); - continue; + if (__btf_member_bitfield_size(t, member)) { + pr_warn("bit field member %s in struct %s is not supported\n", + mname, st_ops->name); + break; } - type_id = btf_find_by_name_kind(btf, st_ops->name, - BTF_KIND_STRUCT); - if (type_id < 0) { - pr_warn("Cannot find struct %s in btf_vmlinux\n", - st_ops->name); - continue; - } - t = btf_type_by_id(btf, type_id); - if (btf_type_vlen(t) > BPF_STRUCT_OPS_MAX_NR_MEMBERS) { - pr_warn("Cannot support #%u members in struct %s\n", - btf_type_vlen(t), st_ops->name); - continue; + func_proto = btf_type_resolve_func_ptr(btf, + member->type, + NULL); + if (func_proto && + btf_distill_func_proto(log, btf, + func_proto, mname, + &st_ops->func_models[i])) { + pr_warn("Error in parsing func ptr %s in struct %s\n", + mname, st_ops->name); + break; } + } - for_each_member(j, t, member) { - const struct btf_type *func_proto; + if (i == btf_type_vlen(t)) { + if (st_ops->init(btf)) { + pr_warn("Error in init bpf_struct_ops %s\n", + st_ops->name); + } else { + st_ops->type_id = type_id; + st_ops->type = t; + st_ops->value_id = value_id; + st_ops->value_type = btf_type_by_id(btf, + value_id); + } + } +} - mname = btf_name_by_offset(btf, member->name_off); - if (!*mname) { - pr_warn("anon member in struct %s is not supported\n", - st_ops->name); - break; - } +void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log) +{ + struct bpf_struct_ops *st_ops; + s32 module_id; + u32 i; - if (__btf_member_bitfield_size(t, member)) { - pr_warn("bit field member %s in struct %s is not supported\n", - mname, st_ops->name); - break; - } + /* Ensure BTF type is emitted for "struct bpf_struct_ops_##_name" */ +#define BPF_STRUCT_OPS_TYPE(_name) BTF_TYPE_EMIT(struct bpf_struct_ops_##_name); +#include "bpf_struct_ops_types.h" +#undef BPF_STRUCT_OPS_TYPE - func_proto = btf_type_resolve_func_ptr(btf, - member->type, - NULL); - if (func_proto && - btf_distill_func_proto(log, btf, - func_proto, mname, - &st_ops->func_models[j])) { - pr_warn("Error in parsing func ptr %s in struct %s\n", - mname, st_ops->name); - break; - } - } + module_id = btf_find_by_name_kind(btf, "module", BTF_KIND_STRUCT); + if (module_id < 0) { + pr_warn("Cannot find struct module in %s\n", btf_get_name(btf)); + return; + } + module_type = btf_type_by_id(btf, module_id); - if (j == btf_type_vlen(t)) { - if (st_ops->init(btf)) { - pr_warn("Error in init bpf_struct_ops %s\n", - st_ops->name); - } else { - st_ops->type_id = type_id; - st_ops->type = t; - st_ops->value_id = value_id; - st_ops->value_type = btf_type_by_id(btf, - value_id); - } - } + for (i = 0; i < ARRAY_SIZE(bpf_struct_ops); i++) { + st_ops = bpf_struct_ops[i]; + bpf_struct_ops_init_one(st_ops, btf, log); } } diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 5d1413e95185..6bc791cc4807 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -1720,6 +1720,11 @@ static void btf_free_rcu(struct rcu_head *rcu) btf_free(btf); } +const char *btf_get_name(const struct btf *btf) +{ + return btf->name; +} + void btf_get(struct btf *btf) { refcount_inc(&btf->refcnt); -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 95678395386d45fa0a075d2e7a6866326a469d76 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Get ready to remove bpf_struct_ops_init() in the future. By using BTF_ID_LIST, it is possible to gather type information while building instead of runtime. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-3-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/bpf_struct_ops.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 96cba76f4ac3..5b3ebcb435d0 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -108,7 +108,12 @@ const struct bpf_prog_ops bpf_struct_ops_prog_ops = { #endif }; -static const struct btf_type *module_type; +BTF_ID_LIST(st_ops_ids) +BTF_ID(struct, module) + +enum { + IDX_MODULE_ID, +}; static void bpf_struct_ops_init_one(struct bpf_struct_ops *st_ops, struct btf *btf, @@ -197,7 +202,6 @@ static void bpf_struct_ops_init_one(struct bpf_struct_ops *st_ops, void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log) { struct bpf_struct_ops *st_ops; - s32 module_id; u32 i; /* Ensure BTF type is emitted for "struct bpf_struct_ops_##_name" */ @@ -205,13 +209,6 @@ void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log) #include "bpf_struct_ops_types.h" #undef BPF_STRUCT_OPS_TYPE - module_id = btf_find_by_name_kind(btf, "module", BTF_KIND_STRUCT); - if (module_id < 0) { - pr_warn("Cannot find struct module in %s\n", btf_get_name(btf)); - return; - } - module_type = btf_type_by_id(btf, module_id); - for (i = 0; i < ARRAY_SIZE(bpf_struct_ops); i++) { st_ops = bpf_struct_ops[i]; bpf_struct_ops_init_one(st_ops, btf, log); @@ -387,6 +384,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map; const struct bpf_struct_ops *st_ops = st_map->st_ops; struct bpf_struct_ops_value *uvalue, *kvalue; + const struct btf_type *module_type; const struct btf_member *member; const struct btf_type *t = st_ops->type; struct bpf_tramp_links *tlinks; @@ -434,6 +432,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, image = st_map->image; image_end = st_map->image + PAGE_SIZE; + module_type = btf_type_by_id(btf_vmlinux, st_ops_ids[IDX_MODULE_ID]); for_each_member(i, t, member) { const struct btf_type *mtype, *ptype; struct bpf_prog *prog; -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 4c5763ed996a61b51d721d0968d0df957826ea49 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Move some of members of bpf_struct_ops to bpf_struct_ops_desc. type_id is unavailabe in bpf_struct_ops anymore. Modules should get it from the btf received by kmod's init function. Cc: netdev@vger.kernel.org Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-4-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Conflicts: include/linux/bpf.h [Fix conflicts because of kabi.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf.h | 17 +++++--- kernel/bpf/bpf_struct_ops.c | 80 +++++++++++++++++----------------- kernel/bpf/verifier.c | 8 ++-- net/bpf/bpf_dummy_struct_ops.c | 11 ++++- net/ipv4/bpf_tcp_ca.c | 8 +++- 5 files changed, 75 insertions(+), 49 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 3bb9b329340f..562e52f0f982 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1792,13 +1792,11 @@ struct bpf_struct_ops { void (*unreg)(void *kdata); int (*update)(void *kdata, void *old_kdata); int (*validate)(void *kdata); - const struct btf_type *type; - const struct btf_type *value_type; + void *cfi_stubs; const char *name; struct btf_func_model func_models[BPF_STRUCT_OPS_MAX_NR_MEMBERS]; u32 type_id; u32 value_id; - void *cfi_stubs; KABI_RESERVE(1) KABI_RESERVE(2) @@ -1806,9 +1804,18 @@ struct bpf_struct_ops { KABI_RESERVE(4) }; +struct bpf_struct_ops_desc { + struct bpf_struct_ops *st_ops; + + const struct btf_type *type; + const struct btf_type *value_type; + u32 type_id; + u32 value_id; +}; + #if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL) #define BPF_MODULE_OWNER ((void *)((0xeB9FUL << 2) + POISON_POINTER_DELTA)) -const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id); +const struct bpf_struct_ops_desc *bpf_struct_ops_find(u32 type_id); void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log); bool bpf_struct_ops_get(const void *kdata); void bpf_struct_ops_put(const void *kdata); @@ -1852,7 +1859,7 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr); #endif #else -static inline const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id) +static inline const struct bpf_struct_ops_desc *bpf_struct_ops_find(u32 type_id) { return NULL; } diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 5b3ebcb435d0..9774f7824e8b 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -32,7 +32,7 @@ struct bpf_struct_ops_value { struct bpf_struct_ops_map { struct bpf_map map; struct rcu_head rcu; - const struct bpf_struct_ops *st_ops; + const struct bpf_struct_ops_desc *st_ops_desc; /* protect map_update */ struct mutex lock; /* link has all the bpf_links that is populated @@ -92,9 +92,9 @@ enum { __NR_BPF_STRUCT_OPS_TYPE, }; -static struct bpf_struct_ops * const bpf_struct_ops[] = { +static struct bpf_struct_ops_desc bpf_struct_ops[] = { #define BPF_STRUCT_OPS_TYPE(_name) \ - [BPF_STRUCT_OPS_TYPE_##_name] = &bpf_##_name, + [BPF_STRUCT_OPS_TYPE_##_name] = { .st_ops = &bpf_##_name }, #include "bpf_struct_ops_types.h" #undef BPF_STRUCT_OPS_TYPE }; @@ -115,10 +115,11 @@ enum { IDX_MODULE_ID, }; -static void bpf_struct_ops_init_one(struct bpf_struct_ops *st_ops, - struct btf *btf, - struct bpf_verifier_log *log) +static void bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, + struct btf *btf, + struct bpf_verifier_log *log) { + struct bpf_struct_ops *st_ops = st_ops_desc->st_ops; const struct btf_member *member; const struct btf_type *t; s32 type_id, value_id; @@ -190,18 +191,18 @@ static void bpf_struct_ops_init_one(struct bpf_struct_ops *st_ops, pr_warn("Error in init bpf_struct_ops %s\n", st_ops->name); } else { - st_ops->type_id = type_id; - st_ops->type = t; - st_ops->value_id = value_id; - st_ops->value_type = btf_type_by_id(btf, - value_id); + st_ops_desc->type_id = type_id; + st_ops_desc->type = t; + st_ops_desc->value_id = value_id; + st_ops_desc->value_type = btf_type_by_id(btf, + value_id); } } } void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log) { - struct bpf_struct_ops *st_ops; + struct bpf_struct_ops_desc *st_ops_desc; u32 i; /* Ensure BTF type is emitted for "struct bpf_struct_ops_##_name" */ @@ -210,14 +211,14 @@ void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log) #undef BPF_STRUCT_OPS_TYPE for (i = 0; i < ARRAY_SIZE(bpf_struct_ops); i++) { - st_ops = bpf_struct_ops[i]; - bpf_struct_ops_init_one(st_ops, btf, log); + st_ops_desc = &bpf_struct_ops[i]; + bpf_struct_ops_desc_init(st_ops_desc, btf, log); } } extern struct btf *btf_vmlinux; -static const struct bpf_struct_ops * +static const struct bpf_struct_ops_desc * bpf_struct_ops_find_value(u32 value_id) { unsigned int i; @@ -226,14 +227,14 @@ bpf_struct_ops_find_value(u32 value_id) return NULL; for (i = 0; i < ARRAY_SIZE(bpf_struct_ops); i++) { - if (bpf_struct_ops[i]->value_id == value_id) - return bpf_struct_ops[i]; + if (bpf_struct_ops[i].value_id == value_id) + return &bpf_struct_ops[i]; } return NULL; } -const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id) +const struct bpf_struct_ops_desc *bpf_struct_ops_find(u32 type_id) { unsigned int i; @@ -241,8 +242,8 @@ const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id) return NULL; for (i = 0; i < ARRAY_SIZE(bpf_struct_ops); i++) { - if (bpf_struct_ops[i]->type_id == type_id) - return bpf_struct_ops[i]; + if (bpf_struct_ops[i].type_id == type_id) + return &bpf_struct_ops[i]; } return NULL; @@ -302,7 +303,7 @@ static void *bpf_struct_ops_map_lookup_elem(struct bpf_map *map, void *key) static void bpf_struct_ops_map_put_progs(struct bpf_struct_ops_map *st_map) { - const struct btf_type *t = st_map->st_ops->type; + const struct btf_type *t = st_map->st_ops_desc->type; u32 i; for (i = 0; i < btf_type_vlen(t); i++) { @@ -382,11 +383,12 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, void *value, u64 flags) { struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map; - const struct bpf_struct_ops *st_ops = st_map->st_ops; + const struct bpf_struct_ops_desc *st_ops_desc = st_map->st_ops_desc; + const struct bpf_struct_ops *st_ops = st_ops_desc->st_ops; struct bpf_struct_ops_value *uvalue, *kvalue; const struct btf_type *module_type; const struct btf_member *member; - const struct btf_type *t = st_ops->type; + const struct btf_type *t = st_ops_desc->type; struct bpf_tramp_links *tlinks; void *udata, *kdata; int prog_fd, err; @@ -399,7 +401,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, if (*(u32 *)key != 0) return -E2BIG; - err = check_zero_holes(st_ops->value_type, value); + err = check_zero_holes(st_ops_desc->value_type, value); if (err) return err; @@ -492,7 +494,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, } if (prog->type != BPF_PROG_TYPE_STRUCT_OPS || - prog->aux->attach_btf_id != st_ops->type_id || + prog->aux->attach_btf_id != st_ops_desc->type_id || prog->expected_attach_type != i) { bpf_prog_put(prog); err = -EINVAL; @@ -588,7 +590,7 @@ static long bpf_struct_ops_map_delete_elem(struct bpf_map *map, void *key) BPF_STRUCT_OPS_STATE_TOBEFREE); switch (prev_state) { case BPF_STRUCT_OPS_STATE_INUSE: - st_map->st_ops->unreg(&st_map->kvalue.data); + st_map->st_ops_desc->st_ops->unreg(&st_map->kvalue.data); bpf_map_put(map); return 0; case BPF_STRUCT_OPS_STATE_TOBEFREE: @@ -669,22 +671,22 @@ static int bpf_struct_ops_map_alloc_check(union bpf_attr *attr) static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) { - const struct bpf_struct_ops *st_ops; + const struct bpf_struct_ops_desc *st_ops_desc; size_t st_map_size; struct bpf_struct_ops_map *st_map; const struct btf_type *t, *vt; struct bpf_map *map; int ret; - st_ops = bpf_struct_ops_find_value(attr->btf_vmlinux_value_type_id); - if (!st_ops) + st_ops_desc = bpf_struct_ops_find_value(attr->btf_vmlinux_value_type_id); + if (!st_ops_desc) return ERR_PTR(-ENOTSUPP); - vt = st_ops->value_type; + vt = st_ops_desc->value_type; if (attr->value_size != vt->size) return ERR_PTR(-EINVAL); - t = st_ops->type; + t = st_ops_desc->type; st_map_size = sizeof(*st_map) + /* kvalue stores the @@ -696,7 +698,7 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) if (!st_map) return ERR_PTR(-ENOMEM); - st_map->st_ops = st_ops; + st_map->st_ops_desc = st_ops_desc; map = &st_map->map; ret = bpf_jit_charge_modmem(PAGE_SIZE); @@ -733,8 +735,8 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) static u64 bpf_struct_ops_map_mem_usage(const struct bpf_map *map) { struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map; - const struct bpf_struct_ops *st_ops = st_map->st_ops; - const struct btf_type *vt = st_ops->value_type; + const struct bpf_struct_ops_desc *st_ops_desc = st_map->st_ops_desc; + const struct btf_type *vt = st_ops_desc->value_type; u64 usage; usage = sizeof(*st_map) + @@ -808,7 +810,7 @@ static void bpf_struct_ops_map_link_dealloc(struct bpf_link *link) /* st_link->map can be NULL if * bpf_struct_ops_link_create() fails to register. */ - st_map->st_ops->unreg(&st_map->kvalue.data); + st_map->st_ops_desc->st_ops->unreg(&st_map->kvalue.data); bpf_map_put(&st_map->map); } kfree(st_link); @@ -855,7 +857,7 @@ static int bpf_struct_ops_map_link_update(struct bpf_link *link, struct bpf_map if (!bpf_struct_ops_valid_to_reg(new_map)) return -EINVAL; - if (!st_map->st_ops->update) + if (!st_map->st_ops_desc->st_ops->update) return -EOPNOTSUPP; mutex_lock(&update_mutex); @@ -868,12 +870,12 @@ static int bpf_struct_ops_map_link_update(struct bpf_link *link, struct bpf_map old_st_map = container_of(old_map, struct bpf_struct_ops_map, map); /* The new and old struct_ops must be the same type. */ - if (st_map->st_ops != old_st_map->st_ops) { + if (st_map->st_ops_desc != old_st_map->st_ops_desc) { err = -EINVAL; goto err_out; } - err = st_map->st_ops->update(st_map->kvalue.data, old_st_map->kvalue.data); + err = st_map->st_ops_desc->st_ops->update(st_map->kvalue.data, old_st_map->kvalue.data); if (err) goto err_out; @@ -924,7 +926,7 @@ int bpf_struct_ops_link_create(union bpf_attr *attr) if (err) goto err_out; - err = st_map->st_ops->reg(st_map->kvalue.data); + err = st_map->st_ops_desc->st_ops->reg(st_map->kvalue.data); if (err) { bpf_link_cleanup(&link_primer); link = NULL; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index aee4419f4649..7968ab9e1265 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -20208,6 +20208,7 @@ static void print_verification_stats(struct bpf_verifier_env *env) static int check_struct_ops_btf_id(struct bpf_verifier_env *env) { const struct btf_type *t, *func_proto; + const struct bpf_struct_ops_desc *st_ops_desc; const struct bpf_struct_ops *st_ops; const struct btf_member *member; struct bpf_prog *prog = env->prog; @@ -20220,14 +20221,15 @@ static int check_struct_ops_btf_id(struct bpf_verifier_env *env) } btf_id = prog->aux->attach_btf_id; - st_ops = bpf_struct_ops_find(btf_id); - if (!st_ops) { + st_ops_desc = bpf_struct_ops_find(btf_id); + if (!st_ops_desc) { verbose(env, "attach_btf_id %u is not a supported struct\n", btf_id); return -ENOTSUPP; } + st_ops = st_ops_desc->st_ops; - t = st_ops->type; + t = st_ops_desc->type; member_idx = prog->expected_attach_type; if (member_idx >= btf_type_vlen(t)) { verbose(env, "attach to invalid member idx %u of struct %s\n", diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c index 8906f7bdf4a9..ba2c58dba2da 100644 --- a/net/bpf/bpf_dummy_struct_ops.c +++ b/net/bpf/bpf_dummy_struct_ops.c @@ -22,6 +22,8 @@ struct bpf_dummy_ops_test_args { struct bpf_dummy_ops_state state; }; +static struct btf *bpf_dummy_ops_btf; + static struct bpf_dummy_ops_test_args * dummy_ops_init_args(const union bpf_attr *kattr, unsigned int nr) { @@ -90,9 +92,15 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, void *image = NULL; unsigned int op_idx; int prog_ret; + s32 type_id; int err; - if (prog->aux->attach_btf_id != st_ops->type_id) + type_id = btf_find_by_name_kind(bpf_dummy_ops_btf, + bpf_bpf_dummy_ops.name, + BTF_KIND_STRUCT); + if (type_id < 0) + return -EINVAL; + if (prog->aux->attach_btf_id != type_id) return -EOPNOTSUPP; func_proto = prog->aux->attach_func_proto; @@ -148,6 +156,7 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, static int bpf_dummy_init(struct btf *btf) { + bpf_dummy_ops_btf = btf; return 0; } diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c index ae8b15e6896f..dffd8828079b 100644 --- a/net/ipv4/bpf_tcp_ca.c +++ b/net/ipv4/bpf_tcp_ca.c @@ -20,6 +20,7 @@ static u32 unsupported_ops[] = { static const struct btf_type *tcp_sock_type; static u32 tcp_sock_id, sock_id; +static const struct btf_type *tcp_congestion_ops_type; static int bpf_tcp_ca_init(struct btf *btf) { @@ -36,6 +37,11 @@ static int bpf_tcp_ca_init(struct btf *btf) tcp_sock_id = type_id; tcp_sock_type = btf_type_by_id(btf, tcp_sock_id); + type_id = btf_find_by_name_kind(btf, "tcp_congestion_ops", BTF_KIND_STRUCT); + if (type_id < 0) + return -EINVAL; + tcp_congestion_ops_type = btf_type_by_id(btf, type_id); + return 0; } @@ -149,7 +155,7 @@ static u32 prog_ops_moff(const struct bpf_prog *prog) u32 midx; midx = prog->expected_attach_type; - t = bpf_tcp_congestion_ops.type; + t = tcp_congestion_ops_type; m = &btf_type_member(t)[midx]; return __btf_member_bit_offset(t, m) / 8; -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit e61995111a76633376419d1bccede8696e94e6e5 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Maintain a registry of registered struct_ops types in the per-btf (module) struct_ops_tab. This registry allows for easy lookup of struct_ops types that are registered by a specific module. It is a preparation work for supporting kernel module struct_ops in a latter patch. Each struct_ops will be registered under its own kernel module btf and will be stored in the newly added btf->struct_ops_tab. The bpf verifier and bpf syscall (e.g. prog and map cmd) can find the struct_ops and its btf type/size/id... information from btf->struct_ops_tab. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-5-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/btf.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 6bc791cc4807..4ecdd08aec7f 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -245,6 +245,12 @@ struct btf_id_dtor_kfunc_tab { struct btf_id_dtor_kfunc dtors[]; }; +struct btf_struct_ops_tab { + u32 cnt; + u32 capacity; + struct bpf_struct_ops_desc ops[]; +}; + struct btf { void *data; struct btf_type **types; @@ -262,6 +268,7 @@ struct btf { struct btf_kfunc_set_tab *kfunc_set_tab; struct btf_id_dtor_kfunc_tab *dtor_kfunc_tab; struct btf_struct_metas *struct_meta_tab; + struct btf_struct_ops_tab *struct_ops_tab; /* split BTF support */ struct btf *base_btf; @@ -1701,11 +1708,20 @@ static void btf_free_struct_meta_tab(struct btf *btf) btf->struct_meta_tab = NULL; } +static void btf_free_struct_ops_tab(struct btf *btf) +{ + struct btf_struct_ops_tab *tab = btf->struct_ops_tab; + + kfree(tab); + btf->struct_ops_tab = NULL; +} + static void btf_free(struct btf *btf) { btf_free_struct_meta_tab(btf); btf_free_dtor_kfunc_tab(btf); btf_free_kfunc_set_tab(btf); + btf_free_struct_ops_tab(btf); kvfree(btf->types); kvfree(btf->resolved_sizes); kvfree(btf->resolved_ids); @@ -8878,3 +8894,42 @@ bool btf_type_ids_nocast_alias(struct bpf_verifier_log *log, return !strncmp(reg_name, arg_name, cmp_len); } + +static int +btf_add_struct_ops(struct btf *btf, struct bpf_struct_ops *st_ops) +{ + struct btf_struct_ops_tab *tab, *new_tab; + int i; + + tab = btf->struct_ops_tab; + if (!tab) { + tab = kzalloc(offsetof(struct btf_struct_ops_tab, ops[4]), + GFP_KERNEL); + if (!tab) + return -ENOMEM; + tab->capacity = 4; + btf->struct_ops_tab = tab; + } + + for (i = 0; i < tab->cnt; i++) + if (tab->ops[i].st_ops == st_ops) + return -EEXIST; + + if (tab->cnt == tab->capacity) { + new_tab = krealloc(tab, + offsetof(struct btf_struct_ops_tab, + ops[tab->capacity * 2]), + GFP_KERNEL); + if (!new_tab) + return -ENOMEM; + tab = new_tab; + tab->capacity *= 2; + btf->struct_ops_tab = tab; + } + + tab->ops[btf->struct_ops_tab->cnt].st_ops = st_ops; + + btf->struct_ops_tab->cnt++; + + return 0; +} -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 47f4f657acd5d04c78c5c5ac7022cba9ce3b4a7d category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Once new struct_ops can be registered from modules, btf_vmlinux is no longer the only btf that struct_ops_map would face. st_map should remember what btf it should use to get type information. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-6-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/bpf_struct_ops.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 9774f7824e8b..5ddcca4c4fba 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -46,6 +46,8 @@ struct bpf_struct_ops_map { * "links[]". */ void *image; + /* The owner moduler's btf. */ + struct btf *btf; /* uvalue->data stores the kernel struct * (e.g. tcp_congestion_ops) that is more useful * to userspace than the kvalue. For example, @@ -314,7 +316,7 @@ static void bpf_struct_ops_map_put_progs(struct bpf_struct_ops_map *st_map) } } -static int check_zero_holes(const struct btf_type *t, void *data) +static int check_zero_holes(const struct btf *btf, const struct btf_type *t, void *data) { const struct btf_member *member; u32 i, moff, msize, prev_mend = 0; @@ -326,8 +328,8 @@ static int check_zero_holes(const struct btf_type *t, void *data) memchr_inv(data + prev_mend, 0, moff - prev_mend)) return -EINVAL; - mtype = btf_type_by_id(btf_vmlinux, member->type); - mtype = btf_resolve_size(btf_vmlinux, mtype, &msize); + mtype = btf_type_by_id(btf, member->type); + mtype = btf_resolve_size(btf, mtype, &msize); if (IS_ERR(mtype)) return PTR_ERR(mtype); prev_mend = moff + msize; @@ -401,12 +403,12 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, if (*(u32 *)key != 0) return -E2BIG; - err = check_zero_holes(st_ops_desc->value_type, value); + err = check_zero_holes(st_map->btf, st_ops_desc->value_type, value); if (err) return err; uvalue = value; - err = check_zero_holes(t, uvalue->data); + err = check_zero_holes(st_map->btf, t, uvalue->data); if (err) return err; @@ -442,7 +444,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, u32 moff; moff = __btf_member_bit_offset(t, member) / 8; - ptype = btf_type_resolve_ptr(btf_vmlinux, member->type, NULL); + ptype = btf_type_resolve_ptr(st_map->btf, member->type, NULL); if (ptype == module_type) { if (*(void **)(udata + moff)) goto reset_unlock; @@ -467,8 +469,8 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, if (!ptype || !btf_type_is_func_proto(ptype)) { u32 msize; - mtype = btf_type_by_id(btf_vmlinux, member->type); - mtype = btf_resolve_size(btf_vmlinux, mtype, &msize); + mtype = btf_type_by_id(st_map->btf, member->type); + mtype = btf_resolve_size(st_map->btf, mtype, &msize); if (IS_ERR(mtype)) { err = PTR_ERR(mtype); goto reset_unlock; @@ -607,6 +609,7 @@ static long bpf_struct_ops_map_delete_elem(struct bpf_map *map, void *key) static void bpf_struct_ops_map_seq_show_elem(struct bpf_map *map, void *key, struct seq_file *m) { + struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map; void *value; int err; @@ -616,7 +619,8 @@ static void bpf_struct_ops_map_seq_show_elem(struct bpf_map *map, void *key, err = bpf_struct_ops_map_sys_lookup_elem(map, key, value); if (!err) { - btf_type_seq_show(btf_vmlinux, map->btf_vmlinux_value_type_id, + btf_type_seq_show(st_map->btf, + map->btf_vmlinux_value_type_id, value, m); seq_puts(m, "\n"); } @@ -726,6 +730,8 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) return ERR_PTR(-ENOMEM); } + st_map->btf = btf_vmlinux; + mutex_init(&st_map->lock); bpf_map_init_from_attr(map, attr); -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 1338b93346587a2a6ac79bbcf55ef5b357745573 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Include btf object id (btf_obj_id) in bpf_map_info so that tools (ex: bpftools struct_ops dump) know the correct btf from the kernel to look up type information of struct_ops types. Since struct_ops types can be defined and registered in a module. The type information of a struct_ops type are defined in the btf of the module defining it. The userspace tools need to know which btf is for the module defining a struct_ops type. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-7-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf.h | 4 ++++ include/uapi/linux/bpf.h | 2 +- kernel/bpf/bpf_struct_ops.c | 7 +++++++ kernel/bpf/syscall.c | 2 ++ tools/include/uapi/linux/bpf.h | 2 +- 5 files changed, 15 insertions(+), 2 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 562e52f0f982..565077321d8c 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1858,6 +1858,7 @@ struct bpf_dummy_ops { int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr); #endif +void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map); #else static inline const struct bpf_struct_ops_desc *bpf_struct_ops_find(u32 type_id) { @@ -1885,6 +1886,9 @@ static inline int bpf_struct_ops_link_create(union bpf_attr *attr) { return -EOPNOTSUPP; } +static inline void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map) +{ +} #endif diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 31e5b8061fe5..c33161c17663 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -6485,7 +6485,7 @@ struct bpf_map_info { __u32 btf_id; __u32 btf_key_type_id; __u32 btf_value_type_id; - __u32 :32; /* alignment pad */ + __u32 btf_vmlinux_id; __u64 map_extra; } __attribute__((aligned(8))); diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 5ddcca4c4fba..5e98af4fc2e2 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -947,3 +947,10 @@ int bpf_struct_ops_link_create(union bpf_attr *attr) kfree(link); return err; } + +void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map) +{ + struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map; + + info->btf_vmlinux_id = btf_obj_id(st_map->btf); +} diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 81ab8e889c7d..0d6d3e6d673c 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -4767,6 +4767,8 @@ static int bpf_map_get_info_by_fd(struct file *file, info.btf_value_type_id = map->btf_value_type_id; } info.btf_vmlinux_value_type_id = map->btf_vmlinux_value_type_id; + if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS) + bpf_map_struct_ops_info_fill(&info, map); if (bpf_map_is_offloaded(map)) { err = bpf_map_offload_info_fill(&info, map); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 7be085c034f7..0d27f3fd97c0 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -6488,7 +6488,7 @@ struct bpf_map_info { __u32 btf_id; __u32 btf_key_type_id; __u32 btf_value_type_id; - __u32 :32; /* alignment pad */ + __u32 btf_vmlinux_id; __u64 map_extra; } __attribute__((aligned(8))); -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 689423db3bda2244c24db8a64de4cdb37be1de41 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- This is a preparation for searching for struct_ops types from a specified module. BTF is always btf_vmlinux now. This patch passes a pointer of BTF to bpf_struct_ops_find_value() and bpf_struct_ops_find(). Once the new registration API of struct_ops types is used, other BTFs besides btf_vmlinux can also be passed to them. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-8-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf.h | 4 ++-- kernel/bpf/bpf_struct_ops.c | 11 ++++++----- kernel/bpf/verifier.c | 2 +- 3 files changed, 9 insertions(+), 8 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 565077321d8c..8720049084c7 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1815,7 +1815,7 @@ struct bpf_struct_ops_desc { #if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL) #define BPF_MODULE_OWNER ((void *)((0xeB9FUL << 2) + POISON_POINTER_DELTA)) -const struct bpf_struct_ops_desc *bpf_struct_ops_find(u32 type_id); +const struct bpf_struct_ops_desc *bpf_struct_ops_find(struct btf *btf, u32 type_id); void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log); bool bpf_struct_ops_get(const void *kdata); void bpf_struct_ops_put(const void *kdata); @@ -1860,7 +1860,7 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, #endif void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map); #else -static inline const struct bpf_struct_ops_desc *bpf_struct_ops_find(u32 type_id) +static inline const struct bpf_struct_ops_desc *bpf_struct_ops_find(struct btf *btf, u32 type_id) { return NULL; } diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 5e98af4fc2e2..7505f515aac3 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -221,11 +221,11 @@ void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log) extern struct btf *btf_vmlinux; static const struct bpf_struct_ops_desc * -bpf_struct_ops_find_value(u32 value_id) +bpf_struct_ops_find_value(struct btf *btf, u32 value_id) { unsigned int i; - if (!value_id || !btf_vmlinux) + if (!value_id || !btf) return NULL; for (i = 0; i < ARRAY_SIZE(bpf_struct_ops); i++) { @@ -236,11 +236,12 @@ bpf_struct_ops_find_value(u32 value_id) return NULL; } -const struct bpf_struct_ops_desc *bpf_struct_ops_find(u32 type_id) +const struct bpf_struct_ops_desc * +bpf_struct_ops_find(struct btf *btf, u32 type_id) { unsigned int i; - if (!type_id || !btf_vmlinux) + if (!type_id || !btf) return NULL; for (i = 0; i < ARRAY_SIZE(bpf_struct_ops); i++) { @@ -682,7 +683,7 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) struct bpf_map *map; int ret; - st_ops_desc = bpf_struct_ops_find_value(attr->btf_vmlinux_value_type_id); + st_ops_desc = bpf_struct_ops_find_value(btf_vmlinux, attr->btf_vmlinux_value_type_id); if (!st_ops_desc) return ERR_PTR(-ENOTSUPP); diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 7968ab9e1265..3dd3da7937a0 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -20221,7 +20221,7 @@ static int check_struct_ops_btf_id(struct bpf_verifier_env *env) } btf_id = prog->aux->attach_btf_id; - st_ops_desc = bpf_struct_ops_find(btf_id); + st_ops_desc = bpf_struct_ops_find(btf_vmlinux, btf_id); if (!st_ops_desc) { verbose(env, "attach_btf_id %u is not a supported struct\n", btf_id); -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit fcc2c1fb0651477c8ed78a3a293c175ccd70697a category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Pass the fd of a btf from the userspace to the bpf() syscall, and then convert the fd into a btf. The btf is generated from the module that defines the target BPF struct_ops type. In order to inform the kernel about the module that defines the target struct_ops type, the userspace program needs to provide a btf fd for the respective module's btf. This btf contains essential information on the types defined within the module, including the target struct_ops type. A btf fd must be provided to the kernel for struct_ops maps and for the bpf programs attached to those maps. In the case of the bpf programs, the attach_btf_obj_fd parameter is passed as part of the bpf_attr and is converted into a btf. This btf is then stored in the prog->aux->attach_btf field. Here, it just let the verifier access attach_btf directly. In the case of struct_ops maps, a btf fd is passed as value_type_btf_obj_fd of bpf_attr. The bpf_struct_ops_map_alloc() function converts the fd to a btf and stores it as st_map->btf. A flag BPF_F_VTYPE_BTF_OBJ_FD is added for map_flags to indicate that the value of value_type_btf_obj_fd is set. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-9-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/uapi/linux/bpf.h | 8 +++++ kernel/bpf/bpf_struct_ops.c | 65 ++++++++++++++++++++++++---------- kernel/bpf/syscall.c | 2 +- kernel/bpf/verifier.c | 9 +++-- tools/include/uapi/linux/bpf.h | 8 +++++ 5 files changed, 70 insertions(+), 22 deletions(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index c33161c17663..da549108042a 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1336,6 +1336,9 @@ enum { /* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */ BPF_F_PATH_FD = (1U << 14), + +/* Flag for value_type_btf_obj_fd, the fd is available */ + BPF_F_VTYPE_BTF_OBJ_FD = (1U << 15), }; /* Flags for BPF_PROG_QUERY. */ @@ -1409,6 +1412,11 @@ union bpf_attr { * to using 5 hash functions). */ __u64 map_extra; + + __s32 value_type_btf_obj_fd; /* fd pointing to a BTF + * type data for + * btf_vmlinux_value_type_id. + */ }; struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 7505f515aac3..3b8d689ece5d 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -641,6 +641,7 @@ static void __bpf_struct_ops_map_free(struct bpf_map *map) bpf_jit_uncharge_modmem(PAGE_SIZE); } bpf_map_area_free(st_map->uvalue); + btf_put(st_map->btf); bpf_map_area_free(st_map); } @@ -669,7 +670,8 @@ static void bpf_struct_ops_map_free(struct bpf_map *map) static int bpf_struct_ops_map_alloc_check(union bpf_attr *attr) { if (attr->key_size != sizeof(unsigned int) || attr->max_entries != 1 || - (attr->map_flags & ~BPF_F_LINK) || !attr->btf_vmlinux_value_type_id) + (attr->map_flags & ~(BPF_F_LINK | BPF_F_VTYPE_BTF_OBJ_FD)) || + !attr->btf_vmlinux_value_type_id) return -EINVAL; return 0; } @@ -681,15 +683,36 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) struct bpf_struct_ops_map *st_map; const struct btf_type *t, *vt; struct bpf_map *map; + struct btf *btf; int ret; - st_ops_desc = bpf_struct_ops_find_value(btf_vmlinux, attr->btf_vmlinux_value_type_id); - if (!st_ops_desc) - return ERR_PTR(-ENOTSUPP); + if (attr->map_flags & BPF_F_VTYPE_BTF_OBJ_FD) { + /* The map holds btf for its whole life time. */ + btf = btf_get_by_fd(attr->value_type_btf_obj_fd); + if (IS_ERR(btf)) + return ERR_CAST(btf); + if (!btf_is_module(btf)) { + btf_put(btf); + return ERR_PTR(-EINVAL); + } + } else { + btf = bpf_get_btf_vmlinux(); + if (IS_ERR(btf)) + return ERR_CAST(btf); + btf_get(btf); + } + + st_ops_desc = bpf_struct_ops_find_value(btf, attr->btf_vmlinux_value_type_id); + if (!st_ops_desc) { + ret = -ENOTSUPP; + goto errout; + } vt = st_ops_desc->value_type; - if (attr->value_size != vt->size) - return ERR_PTR(-EINVAL); + if (attr->value_size != vt->size) { + ret = -EINVAL; + goto errout; + } t = st_ops_desc->type; @@ -700,17 +723,17 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) (vt->size - sizeof(struct bpf_struct_ops_value)); st_map = bpf_map_area_alloc(st_map_size, NUMA_NO_NODE); - if (!st_map) - return ERR_PTR(-ENOMEM); + if (!st_map) { + ret = -ENOMEM; + goto errout; + } st_map->st_ops_desc = st_ops_desc; map = &st_map->map; ret = bpf_jit_charge_modmem(PAGE_SIZE); - if (ret) { - __bpf_struct_ops_map_free(map); - return ERR_PTR(ret); - } + if (ret) + goto errout_free; st_map->image = arch_alloc_bpf_trampoline(PAGE_SIZE); if (!st_map->image) { @@ -719,24 +742,30 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) * here. */ bpf_jit_uncharge_modmem(PAGE_SIZE); - __bpf_struct_ops_map_free(map); - return ERR_PTR(-ENOMEM); + ret = -ENOMEM; + goto errout_free; } st_map->uvalue = bpf_map_area_alloc(vt->size, NUMA_NO_NODE); st_map->links = bpf_map_area_alloc(btf_type_vlen(t) * sizeof(struct bpf_links *), NUMA_NO_NODE); if (!st_map->uvalue || !st_map->links) { - __bpf_struct_ops_map_free(map); - return ERR_PTR(-ENOMEM); + ret = -ENOMEM; + goto errout_free; } - - st_map->btf = btf_vmlinux; + st_map->btf = btf; mutex_init(&st_map->lock); bpf_map_init_from_attr(map, attr); return map; + +errout_free: + __bpf_struct_ops_map_free(map); +errout: + btf_put(btf); + + return ERR_PTR(ret); } static u64 bpf_struct_ops_map_mem_usage(const struct bpf_map *map) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 0d6d3e6d673c..55beb18bc371 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1135,7 +1135,7 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf, return ret; } -#define BPF_MAP_CREATE_LAST_FIELD map_extra +#define BPF_MAP_CREATE_LAST_FIELD value_type_btf_obj_fd /* called via syscall */ static int map_create(union bpf_attr *attr) { diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 3dd3da7937a0..6173c669b8ce 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -20213,6 +20213,7 @@ static int check_struct_ops_btf_id(struct bpf_verifier_env *env) const struct btf_member *member; struct bpf_prog *prog = env->prog; u32 btf_id, member_idx; + struct btf *btf; const char *mname; if (!prog->gpl_compatible) { @@ -20220,8 +20221,10 @@ static int check_struct_ops_btf_id(struct bpf_verifier_env *env) return -EINVAL; } + btf = prog->aux->attach_btf ?: bpf_get_btf_vmlinux(); + btf_id = prog->aux->attach_btf_id; - st_ops_desc = bpf_struct_ops_find(btf_vmlinux, btf_id); + st_ops_desc = bpf_struct_ops_find(btf, btf_id); if (!st_ops_desc) { verbose(env, "attach_btf_id %u is not a supported struct\n", btf_id); @@ -20238,8 +20241,8 @@ static int check_struct_ops_btf_id(struct bpf_verifier_env *env) } member = &btf_type_member(t)[member_idx]; - mname = btf_name_by_offset(btf_vmlinux, member->name_off); - func_proto = btf_type_resolve_func_ptr(btf_vmlinux, member->type, + mname = btf_name_by_offset(btf, member->name_off); + func_proto = btf_type_resolve_func_ptr(btf, member->type, NULL); if (!func_proto) { verbose(env, "attach to invalid member %s(@idx %u) of struct %s\n", diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 0d27f3fd97c0..7b9d03897c37 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1336,6 +1336,9 @@ enum { /* Get path from provided FD in BPF_OBJ_PIN/BPF_OBJ_GET commands */ BPF_F_PATH_FD = (1U << 14), + +/* Flag for value_type_btf_obj_fd, the fd is available */ + BPF_F_VTYPE_BTF_OBJ_FD = (1U << 15), }; /* Flags for BPF_PROG_QUERY. */ @@ -1409,6 +1412,11 @@ union bpf_attr { * to using 5 hash functions). */ __u64 map_extra; + + __s32 value_type_btf_obj_fd; /* fd pointing to a BTF + * type data for + * btf_vmlinux_value_type_id. + */ }; struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ -- 2.34.1
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA -------------------------------- Fix kabi-breakage by using KABI related MARCO because the uapi cannot include kabi.h. And because bpf_attr is a union and pahole show that there are another structure bigger than the modified one. Fixes: 526d613854fe ("bpf: pass attached BTF to the bpf_struct_ops subsystem") Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/uapi/linux/bpf.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index da549108042a..fc59c9ee3988 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1413,10 +1413,12 @@ union bpf_attr { */ __u64 map_extra; +#if !defined(__GENKSYMS__) __s32 value_type_btf_obj_fd; /* fd pointing to a BTF * type data for * btf_vmlinux_value_type_id. */ +#endif }; struct { /* anonymous struct used by BPF_MAP_*_ELEM commands */ -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit e3f87fdfed7b770dd7066b02262b12747881e76d category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- To ensure that a module remains accessible whenever a struct_ops object of a struct_ops type provided by the module is still in use. struct bpf_struct_ops_map doesn't hold a refcnt to btf anymore since a module will hold a refcnt to it's btf already. But, struct_ops programs are different. They hold their associated btf, not the module since they need only btf to assure their types (signatures). However, verifier holds the refcnt of the associated module of a struct_ops type temporarily when verify a struct_ops prog. Verifier needs the help from the verifier operators (struct bpf_verifier_ops) provided by the owner module to verify data access of a prog, provide information, and generate code. This patch also add a count of links (links_cnt) to bpf_struct_ops_map. It avoids bpf_struct_ops_map_put_progs() from accessing btf after calling module_put() in bpf_struct_ops_map_free(). Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-10-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Conflicts: include/linux/bpf.h [Fix conflicts because of kabi.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf.h | 1 + include/linux/bpf_verifier.h | 1 + kernel/bpf/bpf_struct_ops.c | 29 +++++++++++++++++++++++------ kernel/bpf/verifier.c | 11 +++++++++++ 4 files changed, 36 insertions(+), 6 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 8720049084c7..a4a6bf41abce 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1794,6 +1794,7 @@ struct bpf_struct_ops { int (*validate)(void *kdata); void *cfi_stubs; const char *name; + struct module *owner; struct btf_func_model func_models[BPF_STRUCT_OPS_MAX_NR_MEMBERS]; u32 type_id; u32 value_id; diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index e1d213f2585d..badc370db9f9 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -681,6 +681,7 @@ struct bpf_verifier_env { u32 prev_insn_idx; struct bpf_prog *prog; /* eBPF program being verified */ const struct bpf_verifier_ops *ops; + struct module *attach_btf_mod; /* The owner module of prog->aux->attach_btf */ struct bpf_verifier_stack_elem *head; /* stack of verifier states to be processed */ int stack_size; /* number of states to be processed */ bool strict_alignment; /* perform strict pointer alignment checks */ diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 3b8d689ece5d..02216a8d9265 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -40,6 +40,7 @@ struct bpf_struct_ops_map { * (in kvalue.data). */ struct bpf_link **links; + u32 links_cnt; /* image is a page that has all the trampolines * that stores the func args before calling the bpf_prog. * A PAGE_SIZE "image" is enough to store all trampoline for @@ -306,10 +307,9 @@ static void *bpf_struct_ops_map_lookup_elem(struct bpf_map *map, void *key) static void bpf_struct_ops_map_put_progs(struct bpf_struct_ops_map *st_map) { - const struct btf_type *t = st_map->st_ops_desc->type; u32 i; - for (i = 0; i < btf_type_vlen(t); i++) { + for (i = 0; i < st_map->links_cnt; i++) { if (st_map->links[i]) { bpf_link_put(st_map->links[i]); st_map->links[i] = NULL; @@ -641,12 +641,20 @@ static void __bpf_struct_ops_map_free(struct bpf_map *map) bpf_jit_uncharge_modmem(PAGE_SIZE); } bpf_map_area_free(st_map->uvalue); - btf_put(st_map->btf); bpf_map_area_free(st_map); } static void bpf_struct_ops_map_free(struct bpf_map *map) { + struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map; + + /* st_ops->owner was acquired during map_alloc to implicitly holds + * the btf's refcnt. The acquire was only done when btf_is_module() + * st_map->btf cannot be NULL here. + */ + if (btf_is_module(st_map->btf)) + module_put(st_map->st_ops_desc->st_ops->owner); + /* The struct_ops's function may switch to another struct_ops. * * For example, bpf_tcp_cc_x->init() may switch to @@ -682,6 +690,7 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) size_t st_map_size; struct bpf_struct_ops_map *st_map; const struct btf_type *t, *vt; + struct module *mod = NULL; struct bpf_map *map; struct btf *btf; int ret; @@ -695,11 +704,18 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) btf_put(btf); return ERR_PTR(-EINVAL); } + + mod = btf_try_get_module(btf); + /* mod holds a refcnt to btf. We don't need an extra refcnt + * here. + */ + btf_put(btf); + if (!mod) + return ERR_PTR(-EINVAL); } else { btf = bpf_get_btf_vmlinux(); if (IS_ERR(btf)) return ERR_CAST(btf); - btf_get(btf); } st_ops_desc = bpf_struct_ops_find_value(btf, attr->btf_vmlinux_value_type_id); @@ -746,8 +762,9 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) goto errout_free; } st_map->uvalue = bpf_map_area_alloc(vt->size, NUMA_NO_NODE); + st_map->links_cnt = btf_type_vlen(t); st_map->links = - bpf_map_area_alloc(btf_type_vlen(t) * sizeof(struct bpf_links *), + bpf_map_area_alloc(st_map->links_cnt * sizeof(struct bpf_links *), NUMA_NO_NODE); if (!st_map->uvalue || !st_map->links) { ret = -ENOMEM; @@ -763,7 +780,7 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) errout_free: __bpf_struct_ops_map_free(map); errout: - btf_put(btf); + module_put(mod); return ERR_PTR(ret); } diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 6173c669b8ce..a3df4861690a 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -20222,6 +20222,15 @@ static int check_struct_ops_btf_id(struct bpf_verifier_env *env) } btf = prog->aux->attach_btf ?: bpf_get_btf_vmlinux(); + if (btf_is_module(btf)) { + /* Make sure st_ops is valid through the lifetime of env */ + env->attach_btf_mod = btf_try_get_module(btf); + if (!env->attach_btf_mod) { + verbose(env, "struct_ops module %s is not found\n", + btf_get_name(btf)); + return -ENOTSUPP; + } + } btf_id = prog->aux->attach_btf_id; st_ops_desc = bpf_struct_ops_find(btf, btf_id); @@ -20989,6 +20998,8 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr, bpfptr_t uattr, __u3 env->prog->expected_attach_type = 0; *prog = env->prog; + + module_put(env->attach_btf_mod); err_unlock: if (!is_priv) mutex_unlock(&bpf_verifier_lock); -- 2.34.1
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA -------------------------------- Fix kabi-breakage by using KABI_USE. Fixes: 55cc81adffd9 ("bpf: hold module refcnt in bpf_struct_ops map creation and prog verification.") Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf_verifier.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index badc370db9f9..eeef504067e6 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -681,7 +681,6 @@ struct bpf_verifier_env { u32 prev_insn_idx; struct bpf_prog *prog; /* eBPF program being verified */ const struct bpf_verifier_ops *ops; - struct module *attach_btf_mod; /* The owner module of prog->aux->attach_btf */ struct bpf_verifier_stack_elem *head; /* stack of verifier states to be processed */ int stack_size; /* number of states to be processed */ bool strict_alignment; /* perform strict pointer alignment checks */ @@ -749,7 +748,7 @@ struct bpf_verifier_env { char tmp_str_buf[TMP_STR_BUF_LEN]; KABI_USE(1, struct bpf_jmp_history_entry *cur_hist_ent) - KABI_RESERVE(2) + KABI_USE(2, struct module *attach_btf_mod) KABI_RESERVE(3) KABI_RESERVE(4) KABI_RESERVE(5) -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 612d087d4ba54cef47946e22e5dabad762dd7ed5 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- A value_type should consist of three components: refcnt, state, and data. refcnt and state has been move to struct bpf_struct_ops_common_value to make it easier to check the value type. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-11-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf.h | 12 +++++ kernel/bpf/bpf_struct_ops.c | 93 ++++++++++++++++++++++++------------- 2 files changed, 72 insertions(+), 33 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index a4a6bf41abce..6eba14c2eb93 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1814,6 +1814,18 @@ struct bpf_struct_ops_desc { u32 value_id; }; +enum bpf_struct_ops_state { + BPF_STRUCT_OPS_STATE_INIT, + BPF_STRUCT_OPS_STATE_INUSE, + BPF_STRUCT_OPS_STATE_TOBEFREE, + BPF_STRUCT_OPS_STATE_READY, +}; + +struct bpf_struct_ops_common_value { + refcount_t refcnt; + enum bpf_struct_ops_state state; +}; + #if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL) #define BPF_MODULE_OWNER ((void *)((0xeB9FUL << 2) + POISON_POINTER_DELTA)) const struct bpf_struct_ops_desc *bpf_struct_ops_find(struct btf *btf, u32 type_id); diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 02216a8d9265..30ab34fab0f8 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -13,19 +13,8 @@ #include <linux/btf_ids.h> #include <linux/rcupdate_wait.h> -enum bpf_struct_ops_state { - BPF_STRUCT_OPS_STATE_INIT, - BPF_STRUCT_OPS_STATE_INUSE, - BPF_STRUCT_OPS_STATE_TOBEFREE, - BPF_STRUCT_OPS_STATE_READY, -}; - -#define BPF_STRUCT_OPS_COMMON_VALUE \ - refcount_t refcnt; \ - enum bpf_struct_ops_state state - struct bpf_struct_ops_value { - BPF_STRUCT_OPS_COMMON_VALUE; + struct bpf_struct_ops_common_value common; char data[] ____cacheline_aligned_in_smp; }; @@ -81,8 +70,8 @@ static DEFINE_MUTEX(update_mutex); #define BPF_STRUCT_OPS_TYPE(_name) \ extern struct bpf_struct_ops bpf_##_name; \ \ -struct bpf_struct_ops_##_name { \ - BPF_STRUCT_OPS_COMMON_VALUE; \ +struct bpf_struct_ops_##_name { \ + struct bpf_struct_ops_common_value common; \ struct _name data ____cacheline_aligned_in_smp; \ }; #include "bpf_struct_ops_types.h" @@ -113,11 +102,49 @@ const struct bpf_prog_ops bpf_struct_ops_prog_ops = { BTF_ID_LIST(st_ops_ids) BTF_ID(struct, module) +BTF_ID(struct, bpf_struct_ops_common_value) enum { IDX_MODULE_ID, + IDX_ST_OPS_COMMON_VALUE_ID, }; +extern struct btf *btf_vmlinux; + +static bool is_valid_value_type(struct btf *btf, s32 value_id, + const struct btf_type *type, + const char *value_name) +{ + const struct btf_type *common_value_type; + const struct btf_member *member; + const struct btf_type *vt, *mt; + + vt = btf_type_by_id(btf, value_id); + if (btf_vlen(vt) != 2) { + pr_warn("The number of %s's members should be 2, but we get %d\n", + value_name, btf_vlen(vt)); + return false; + } + member = btf_type_member(vt); + mt = btf_type_by_id(btf, member->type); + common_value_type = btf_type_by_id(btf_vmlinux, + st_ops_ids[IDX_ST_OPS_COMMON_VALUE_ID]); + if (mt != common_value_type) { + pr_warn("The first member of %s should be bpf_struct_ops_common_value\n", + value_name); + return false; + } + member++; + mt = btf_type_by_id(btf, member->type); + if (mt != type) { + pr_warn("The second member of %s should be %s\n", + value_name, btf_name_by_offset(btf, type->name_off)); + return false; + } + + return true; +} + static void bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, struct btf *btf, struct bpf_verifier_log *log) @@ -138,14 +165,6 @@ static void bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, } sprintf(value_name, "%s%s", VALUE_PREFIX, st_ops->name); - value_id = btf_find_by_name_kind(btf, value_name, - BTF_KIND_STRUCT); - if (value_id < 0) { - pr_warn("Cannot find struct %s in %s\n", - value_name, btf_get_name(btf)); - return; - } - type_id = btf_find_by_name_kind(btf, st_ops->name, BTF_KIND_STRUCT); if (type_id < 0) { @@ -160,6 +179,16 @@ static void bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, return; } + value_id = btf_find_by_name_kind(btf, value_name, + BTF_KIND_STRUCT); + if (value_id < 0) { + pr_warn("Cannot find struct %s in %s\n", + value_name, btf_get_name(btf)); + return; + } + if (!is_valid_value_type(btf, value_id, t, value_name)) + return; + for_each_member(i, t, member) { const struct btf_type *func_proto; @@ -219,8 +248,6 @@ void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log) } } -extern struct btf *btf_vmlinux; - static const struct bpf_struct_ops_desc * bpf_struct_ops_find_value(struct btf *btf, u32 value_id) { @@ -276,7 +303,7 @@ int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key, kvalue = &st_map->kvalue; /* Pair with smp_store_release() during map_update */ - state = smp_load_acquire(&kvalue->state); + state = smp_load_acquire(&kvalue->common.state); if (state == BPF_STRUCT_OPS_STATE_INIT) { memset(value, 0, map->value_size); return 0; @@ -287,7 +314,7 @@ int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key, */ uvalue = value; memcpy(uvalue, st_map->uvalue, map->value_size); - uvalue->state = state; + uvalue->common.state = state; /* This value offers the user space a general estimate of how * many sockets are still utilizing this struct_ops for TCP @@ -295,7 +322,7 @@ int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key, * should sufficiently meet our present goals. */ refcnt = atomic64_read(&map->refcnt) - atomic64_read(&map->usercnt); - refcount_set(&uvalue->refcnt, max_t(s64, refcnt, 0)); + refcount_set(&uvalue->common.refcnt, max_t(s64, refcnt, 0)); return 0; } @@ -413,7 +440,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, if (err) return err; - if (uvalue->state || refcount_read(&uvalue->refcnt)) + if (uvalue->common.state || refcount_read(&uvalue->common.refcnt)) return -EINVAL; tlinks = kcalloc(BPF_TRAMP_MAX, sizeof(*tlinks), GFP_KERNEL); @@ -425,7 +452,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, mutex_lock(&st_map->lock); - if (kvalue->state != BPF_STRUCT_OPS_STATE_INIT) { + if (kvalue->common.state != BPF_STRUCT_OPS_STATE_INIT) { err = -EBUSY; goto unlock; } @@ -540,7 +567,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, * * Pair with smp_load_acquire() during lookup_elem(). */ - smp_store_release(&kvalue->state, BPF_STRUCT_OPS_STATE_READY); + smp_store_release(&kvalue->common.state, BPF_STRUCT_OPS_STATE_READY); goto unlock; } @@ -558,7 +585,7 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, * It ensures the above udata updates (e.g. prog->aux->id) * can be seen once BPF_STRUCT_OPS_STATE_INUSE is set. */ - smp_store_release(&kvalue->state, BPF_STRUCT_OPS_STATE_INUSE); + smp_store_release(&kvalue->common.state, BPF_STRUCT_OPS_STATE_INUSE); goto unlock; } @@ -588,7 +615,7 @@ static long bpf_struct_ops_map_delete_elem(struct bpf_map *map, void *key) if (st_map->map.map_flags & BPF_F_LINK) return -EOPNOTSUPP; - prev_state = cmpxchg(&st_map->kvalue.state, + prev_state = cmpxchg(&st_map->kvalue.common.state, BPF_STRUCT_OPS_STATE_INUSE, BPF_STRUCT_OPS_STATE_TOBEFREE); switch (prev_state) { @@ -848,7 +875,7 @@ static bool bpf_struct_ops_valid_to_reg(struct bpf_map *map) return map->map_type == BPF_MAP_TYPE_STRUCT_OPS && map->map_flags & BPF_F_LINK && /* Pair with smp_store_release() during map_update */ - smp_load_acquire(&st_map->kvalue.state) == BPF_STRUCT_OPS_STATE_READY; + smp_load_acquire(&st_map->kvalue.common.state) == BPF_STRUCT_OPS_STATE_READY; } static void bpf_struct_ops_map_link_dealloc(struct bpf_link *link) -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit f6be98d19985411ca1f3d53413d94d5b7f41c200 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Replace the static list of struct_ops types with per-btf struct_ops_tab to enable dynamic registration. Both bpf_dummy_ops and bpf_tcp_ca now utilize the registration function instead of being listed in bpf_struct_ops_types.h. Cc: netdev@vger.kernel.org Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-12-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf.h | 27 +++++--- include/linux/btf.h | 12 ++++ kernel/bpf/bpf_struct_ops.c | 100 ++++-------------------------- kernel/bpf/bpf_struct_ops_types.h | 12 ---- kernel/bpf/btf.c | 86 +++++++++++++++++++++++-- net/bpf/bpf_dummy_struct_ops.c | 11 +++- net/ipv4/bpf_tcp_ca.c | 12 +++- 7 files changed, 142 insertions(+), 118 deletions(-) delete mode 100644 kernel/bpf/bpf_struct_ops_types.h diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 6eba14c2eb93..0624b33be7a2 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1827,9 +1827,20 @@ struct bpf_struct_ops_common_value { }; #if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL) +/* This macro helps developer to register a struct_ops type and generate + * type information correctly. Developers should use this macro to register + * a struct_ops type instead of calling __register_bpf_struct_ops() directly. + */ +#define register_bpf_struct_ops(st_ops, type) \ + ({ \ + struct bpf_struct_ops_##type { \ + struct bpf_struct_ops_common_value common; \ + struct type data ____cacheline_aligned_in_smp; \ + }; \ + BTF_TYPE_EMIT(struct bpf_struct_ops_##type); \ + __register_bpf_struct_ops(st_ops); \ + }) #define BPF_MODULE_OWNER ((void *)((0xeB9FUL << 2) + POISON_POINTER_DELTA)) -const struct bpf_struct_ops_desc *bpf_struct_ops_find(struct btf *btf, u32 type_id); -void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log); bool bpf_struct_ops_get(const void *kdata); void bpf_struct_ops_put(const void *kdata); int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key, @@ -1871,16 +1882,12 @@ struct bpf_dummy_ops { int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr); #endif +int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, + struct btf *btf, + struct bpf_verifier_log *log); void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map); #else -static inline const struct bpf_struct_ops_desc *bpf_struct_ops_find(struct btf *btf, u32 type_id) -{ - return NULL; -} -static inline void bpf_struct_ops_init(struct btf *btf, - struct bpf_verifier_log *log) -{ -} +#define register_bpf_struct_ops(st_ops, type) ({ (void *)(st_ops); 0; }) static inline bool bpf_try_module_get(const void *data, struct module *owner) { return try_module_get(owner); diff --git a/include/linux/btf.h b/include/linux/btf.h index 1d852dad7473..79f34103ab5d 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -497,6 +497,18 @@ static inline void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id) struct bpf_verifier_log; +#if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL) +struct bpf_struct_ops; +int __register_bpf_struct_ops(struct bpf_struct_ops *st_ops); +const struct bpf_struct_ops_desc *bpf_struct_ops_find_value(struct btf *btf, u32 value_id); +const struct bpf_struct_ops_desc *bpf_struct_ops_find(struct btf *btf, u32 type_id); +#else +static inline const struct bpf_struct_ops_desc *bpf_struct_ops_find(struct btf *btf, u32 type_id) +{ + return NULL; +} +#endif + #ifdef CONFIG_BPF_SYSCALL const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id); const char *btf_name_by_offset(const struct btf *btf, u32 offset); diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 30ab34fab0f8..defc052e4622 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -62,35 +62,6 @@ static DEFINE_MUTEX(update_mutex); #define VALUE_PREFIX "bpf_struct_ops_" #define VALUE_PREFIX_LEN (sizeof(VALUE_PREFIX) - 1) -/* bpf_struct_ops_##_name (e.g. bpf_struct_ops_tcp_congestion_ops) is - * the map's value exposed to the userspace and its btf-type-id is - * stored at the map->btf_vmlinux_value_type_id. - * - */ -#define BPF_STRUCT_OPS_TYPE(_name) \ -extern struct bpf_struct_ops bpf_##_name; \ - \ -struct bpf_struct_ops_##_name { \ - struct bpf_struct_ops_common_value common; \ - struct _name data ____cacheline_aligned_in_smp; \ -}; -#include "bpf_struct_ops_types.h" -#undef BPF_STRUCT_OPS_TYPE - -enum { -#define BPF_STRUCT_OPS_TYPE(_name) BPF_STRUCT_OPS_TYPE_##_name, -#include "bpf_struct_ops_types.h" -#undef BPF_STRUCT_OPS_TYPE - __NR_BPF_STRUCT_OPS_TYPE, -}; - -static struct bpf_struct_ops_desc bpf_struct_ops[] = { -#define BPF_STRUCT_OPS_TYPE(_name) \ - [BPF_STRUCT_OPS_TYPE_##_name] = { .st_ops = &bpf_##_name }, -#include "bpf_struct_ops_types.h" -#undef BPF_STRUCT_OPS_TYPE -}; - const struct bpf_verifier_ops bpf_struct_ops_verifier_ops = { }; @@ -145,9 +116,9 @@ static bool is_valid_value_type(struct btf *btf, s32 value_id, return true; } -static void bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, - struct btf *btf, - struct bpf_verifier_log *log) +int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, + struct btf *btf, + struct bpf_verifier_log *log) { struct bpf_struct_ops *st_ops = st_ops_desc->st_ops; const struct btf_member *member; @@ -161,7 +132,7 @@ static void bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, sizeof(value_name)) { pr_warn("struct_ops name %s is too long\n", st_ops->name); - return; + return -EINVAL; } sprintf(value_name, "%s%s", VALUE_PREFIX, st_ops->name); @@ -170,13 +141,13 @@ static void bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, if (type_id < 0) { pr_warn("Cannot find struct %s in %s\n", st_ops->name, btf_get_name(btf)); - return; + return -EINVAL; } t = btf_type_by_id(btf, type_id); if (btf_type_vlen(t) > BPF_STRUCT_OPS_MAX_NR_MEMBERS) { pr_warn("Cannot support #%u members in struct %s\n", btf_type_vlen(t), st_ops->name); - return; + return -EINVAL; } value_id = btf_find_by_name_kind(btf, value_name, @@ -184,10 +155,10 @@ static void bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, if (value_id < 0) { pr_warn("Cannot find struct %s in %s\n", value_name, btf_get_name(btf)); - return; + return -EINVAL; } if (!is_valid_value_type(btf, value_id, t, value_name)) - return; + return -EINVAL; for_each_member(i, t, member) { const struct btf_type *func_proto; @@ -196,13 +167,13 @@ static void bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, if (!*mname) { pr_warn("anon member in struct %s is not supported\n", st_ops->name); - break; + return -EOPNOTSUPP; } if (__btf_member_bitfield_size(t, member)) { pr_warn("bit field member %s in struct %s is not supported\n", mname, st_ops->name); - break; + return -EOPNOTSUPP; } func_proto = btf_type_resolve_func_ptr(btf, @@ -214,7 +185,7 @@ static void bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, &st_ops->func_models[i])) { pr_warn("Error in parsing func ptr %s in struct %s\n", mname, st_ops->name); - break; + return -EINVAL; } } @@ -222,6 +193,7 @@ static void bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, if (st_ops->init(btf)) { pr_warn("Error in init bpf_struct_ops %s\n", st_ops->name); + return -EINVAL; } else { st_ops_desc->type_id = type_id; st_ops_desc->type = t; @@ -230,54 +202,8 @@ static void bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, value_id); } } -} -void bpf_struct_ops_init(struct btf *btf, struct bpf_verifier_log *log) -{ - struct bpf_struct_ops_desc *st_ops_desc; - u32 i; - - /* Ensure BTF type is emitted for "struct bpf_struct_ops_##_name" */ -#define BPF_STRUCT_OPS_TYPE(_name) BTF_TYPE_EMIT(struct bpf_struct_ops_##_name); -#include "bpf_struct_ops_types.h" -#undef BPF_STRUCT_OPS_TYPE - - for (i = 0; i < ARRAY_SIZE(bpf_struct_ops); i++) { - st_ops_desc = &bpf_struct_ops[i]; - bpf_struct_ops_desc_init(st_ops_desc, btf, log); - } -} - -static const struct bpf_struct_ops_desc * -bpf_struct_ops_find_value(struct btf *btf, u32 value_id) -{ - unsigned int i; - - if (!value_id || !btf) - return NULL; - - for (i = 0; i < ARRAY_SIZE(bpf_struct_ops); i++) { - if (bpf_struct_ops[i].value_id == value_id) - return &bpf_struct_ops[i]; - } - - return NULL; -} - -const struct bpf_struct_ops_desc * -bpf_struct_ops_find(struct btf *btf, u32 type_id) -{ - unsigned int i; - - if (!type_id || !btf) - return NULL; - - for (i = 0; i < ARRAY_SIZE(bpf_struct_ops); i++) { - if (bpf_struct_ops[i].type_id == type_id) - return &bpf_struct_ops[i]; - } - - return NULL; + return 0; } static int bpf_struct_ops_map_get_next_key(struct bpf_map *map, void *key, diff --git a/kernel/bpf/bpf_struct_ops_types.h b/kernel/bpf/bpf_struct_ops_types.h deleted file mode 100644 index 5678a9ddf817..000000000000 --- a/kernel/bpf/bpf_struct_ops_types.h +++ /dev/null @@ -1,12 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* internal file - do not include directly */ - -#ifdef CONFIG_BPF_JIT -#ifdef CONFIG_NET -BPF_STRUCT_OPS_TYPE(bpf_dummy_ops) -#endif -#ifdef CONFIG_INET -#include <net/tcp.h> -BPF_STRUCT_OPS_TYPE(tcp_congestion_ops) -#endif -#endif diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 4ecdd08aec7f..1cd7e8fbcc57 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -19,6 +19,7 @@ #include <linux/bpf_verifier.h> #include <linux/btf.h> #include <linux/btf_ids.h> +#include <linux/bpf.h> #include <linux/bpf_lsm.h> #include <linux/skmsg.h> #include <linux/perf_event.h> @@ -5801,8 +5802,6 @@ struct btf *btf_parse_vmlinux(void) /* btf_parse_vmlinux() runs under bpf_verifier_lock */ bpf_ctx_convert.t = btf_type_by_id(btf, bpf_ctx_convert_btf_id[0]); - bpf_struct_ops_init(btf, log); - refcount_set(&btf->refcnt, 1); err = btf_alloc_id(btf); @@ -8895,11 +8894,13 @@ bool btf_type_ids_nocast_alias(struct bpf_verifier_log *log, return !strncmp(reg_name, arg_name, cmp_len); } +#ifdef CONFIG_BPF_JIT static int -btf_add_struct_ops(struct btf *btf, struct bpf_struct_ops *st_ops) +btf_add_struct_ops(struct btf *btf, struct bpf_struct_ops *st_ops, + struct bpf_verifier_log *log) { struct btf_struct_ops_tab *tab, *new_tab; - int i; + int i, err; tab = btf->struct_ops_tab; if (!tab) { @@ -8929,7 +8930,84 @@ btf_add_struct_ops(struct btf *btf, struct bpf_struct_ops *st_ops) tab->ops[btf->struct_ops_tab->cnt].st_ops = st_ops; + err = bpf_struct_ops_desc_init(&tab->ops[btf->struct_ops_tab->cnt], btf, log); + if (err) + return err; + btf->struct_ops_tab->cnt++; return 0; } + +const struct bpf_struct_ops_desc * +bpf_struct_ops_find_value(struct btf *btf, u32 value_id) +{ + const struct bpf_struct_ops_desc *st_ops_list; + unsigned int i; + u32 cnt; + + if (!value_id) + return NULL; + if (!btf->struct_ops_tab) + return NULL; + + cnt = btf->struct_ops_tab->cnt; + st_ops_list = btf->struct_ops_tab->ops; + for (i = 0; i < cnt; i++) { + if (st_ops_list[i].value_id == value_id) + return &st_ops_list[i]; + } + + return NULL; +} + +const struct bpf_struct_ops_desc * +bpf_struct_ops_find(struct btf *btf, u32 type_id) +{ + const struct bpf_struct_ops_desc *st_ops_list; + unsigned int i; + u32 cnt; + + if (!type_id) + return NULL; + if (!btf->struct_ops_tab) + return NULL; + + cnt = btf->struct_ops_tab->cnt; + st_ops_list = btf->struct_ops_tab->ops; + for (i = 0; i < cnt; i++) { + if (st_ops_list[i].type_id == type_id) + return &st_ops_list[i]; + } + + return NULL; +} + +int __register_bpf_struct_ops(struct bpf_struct_ops *st_ops) +{ + struct bpf_verifier_log *log; + struct btf *btf; + int err = 0; + + btf = btf_get_module_btf(st_ops->owner); + if (!btf) + return -EINVAL; + + log = kzalloc(sizeof(*log), GFP_KERNEL | __GFP_NOWARN); + if (!log) { + err = -ENOMEM; + goto errout; + } + + log->level = BPF_LOG_KERNEL; + + err = btf_add_struct_ops(btf, st_ops, log); + +errout: + kfree(log); + btf_put(btf); + + return err; +} +EXPORT_SYMBOL_GPL(__register_bpf_struct_ops); +#endif diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c index ba2c58dba2da..02de71719aed 100644 --- a/net/bpf/bpf_dummy_struct_ops.c +++ b/net/bpf/bpf_dummy_struct_ops.c @@ -7,7 +7,7 @@ #include <linux/bpf.h> #include <linux/btf.h> -extern struct bpf_struct_ops bpf_bpf_dummy_ops; +static struct bpf_struct_ops bpf_bpf_dummy_ops; /* A common type for test_N with return value in bpf_dummy_ops */ typedef int (*dummy_ops_test_ret_fn)(struct bpf_dummy_ops_state *state, ...); @@ -256,7 +256,7 @@ static struct bpf_dummy_ops __bpf_bpf_dummy_ops = { .test_sleepable = bpf_dummy_test_sleepable, }; -struct bpf_struct_ops bpf_bpf_dummy_ops = { +static struct bpf_struct_ops bpf_bpf_dummy_ops = { .verifier_ops = &bpf_dummy_verifier_ops, .init = bpf_dummy_init, .check_member = bpf_dummy_ops_check_member, @@ -265,4 +265,11 @@ struct bpf_struct_ops bpf_bpf_dummy_ops = { .unreg = bpf_dummy_unreg, .name = "bpf_dummy_ops", .cfi_stubs = &__bpf_bpf_dummy_ops, + .owner = THIS_MODULE, }; + +static int __init bpf_dummy_struct_ops_init(void) +{ + return register_bpf_struct_ops(&bpf_bpf_dummy_ops, bpf_dummy_ops); +} +late_initcall(bpf_dummy_struct_ops_init); diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c index dffd8828079b..8e7716256d3c 100644 --- a/net/ipv4/bpf_tcp_ca.c +++ b/net/ipv4/bpf_tcp_ca.c @@ -12,7 +12,7 @@ #include <net/bpf_sk_storage.h> /* "extern" is to avoid sparse warning. It is only used in bpf_struct_ops.c. */ -extern struct bpf_struct_ops bpf_tcp_congestion_ops; +static struct bpf_struct_ops bpf_tcp_congestion_ops; static u32 unsupported_ops[] = { offsetof(struct tcp_congestion_ops, get_info), @@ -345,7 +345,7 @@ static struct tcp_congestion_ops __bpf_ops_tcp_congestion_ops = { .release = __bpf_tcp_ca_release, }; -struct bpf_struct_ops bpf_tcp_congestion_ops = { +static struct bpf_struct_ops bpf_tcp_congestion_ops = { .verifier_ops = &bpf_tcp_ca_verifier_ops, .reg = bpf_tcp_ca_reg, .unreg = bpf_tcp_ca_unreg, @@ -356,10 +356,16 @@ struct bpf_struct_ops bpf_tcp_congestion_ops = { .validate = bpf_tcp_ca_validate, .name = "tcp_congestion_ops", .cfi_stubs = &__bpf_ops_tcp_congestion_ops, + .owner = THIS_MODULE, }; static int __init bpf_tcp_ca_kfunc_init(void) { - return register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_tcp_ca_kfunc_set); + int ret; + + ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &bpf_tcp_ca_kfunc_set); + ret = ret ?: register_bpf_struct_ops(&bpf_tcp_congestion_ops, tcp_congestion_ops); + + return ret; } late_initcall(bpf_tcp_ca_kfunc_init); -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 9e926acda0c2e21bca431a1818665ddcd6939755 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Locate the module BTFs for struct_ops maps and progs and pass them to the kernel. This ensures that the kernel correctly resolves type IDs from the appropriate module BTFs. For the map of a struct_ops object, the FD of the module BTF is set to bpf_map to keep a reference to the module BTF. The FD is passed to the kernel as value_type_btf_obj_fd when the struct_ops object is loaded. For a bpf_struct_ops prog, attach_btf_obj_fd of bpf_prog is the FD of a module BTF in the kernel. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240119225005.668602-13-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/bpf.c | 4 +++- tools/lib/bpf/bpf.h | 4 +++- tools/lib/bpf/libbpf.c | 41 ++++++++++++++++++++++++++--------- tools/lib/bpf/libbpf_probes.c | 1 + 4 files changed, 38 insertions(+), 12 deletions(-) diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index a5e21d9baf61..6a8b495b53f5 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -171,7 +171,8 @@ int bpf_map_create(enum bpf_map_type map_type, __u32 max_entries, const struct bpf_map_create_opts *opts) { - const size_t attr_sz = offsetofend(union bpf_attr, map_extra); + const size_t attr_sz = offsetofend(union bpf_attr, + value_type_btf_obj_fd); union bpf_attr attr; int fd; @@ -193,6 +194,7 @@ int bpf_map_create(enum bpf_map_type map_type, attr.btf_key_type_id = OPTS_GET(opts, btf_key_type_id, 0); attr.btf_value_type_id = OPTS_GET(opts, btf_value_type_id, 0); attr.btf_vmlinux_value_type_id = OPTS_GET(opts, btf_vmlinux_value_type_id, 0); + attr.value_type_btf_obj_fd = OPTS_GET(opts, value_type_btf_obj_fd, -1); attr.inner_map_fd = OPTS_GET(opts, inner_map_fd, 0); attr.map_flags = OPTS_GET(opts, map_flags, 0); diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index 107fef748868..db0ff8ade19a 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -51,8 +51,10 @@ struct bpf_map_create_opts { __u32 numa_node; __u32 map_ifindex; + __s32 value_type_btf_obj_fd; + size_t:0; }; -#define bpf_map_create_opts__last_field map_ifindex +#define bpf_map_create_opts__last_field value_type_btf_obj_fd LIBBPF_API int bpf_map_create(enum bpf_map_type map_type, const char *map_name, diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index d55a75b48bbe..072f72c1b40c 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -523,6 +523,7 @@ struct bpf_map { struct bpf_map_def def; __u32 numa_node; __u32 btf_var_idx; + int mod_btf_fd; __u32 btf_key_type_id; __u32 btf_value_type_id; __u32 btf_vmlinux_value_type_id; @@ -923,22 +924,29 @@ find_member_by_name(const struct btf *btf, const struct btf_type *t, return NULL; } +static int find_ksym_btf_id(struct bpf_object *obj, const char *ksym_name, + __u16 kind, struct btf **res_btf, + struct module_btf **res_mod_btf); + #define STRUCT_OPS_VALUE_PREFIX "bpf_struct_ops_" static int find_btf_by_prefix_kind(const struct btf *btf, const char *prefix, const char *name, __u32 kind); static int -find_struct_ops_kern_types(const struct btf *btf, const char *tname, +find_struct_ops_kern_types(struct bpf_object *obj, const char *tname, + struct module_btf **mod_btf, const struct btf_type **type, __u32 *type_id, const struct btf_type **vtype, __u32 *vtype_id, const struct btf_member **data_member) { const struct btf_type *kern_type, *kern_vtype; const struct btf_member *kern_data_member; + struct btf *btf; __s32 kern_vtype_id, kern_type_id; __u32 i; - kern_type_id = btf__find_by_name_kind(btf, tname, BTF_KIND_STRUCT); + kern_type_id = find_ksym_btf_id(obj, tname, BTF_KIND_STRUCT, + &btf, mod_btf); if (kern_type_id < 0) { pr_warn("struct_ops init_kern: struct %s is not found in kernel BTF\n", tname); @@ -992,14 +1000,16 @@ static bool bpf_map__is_struct_ops(const struct bpf_map *map) } /* Init the map's fields that depend on kern_btf */ -static int bpf_map__init_kern_struct_ops(struct bpf_map *map, - const struct btf *btf, - const struct btf *kern_btf) +static int bpf_map__init_kern_struct_ops(struct bpf_map *map) { const struct btf_member *member, *kern_member, *kern_data_member; const struct btf_type *type, *kern_type, *kern_vtype; __u32 i, kern_type_id, kern_vtype_id, kern_data_off; + struct bpf_object *obj = map->obj; + const struct btf *btf = obj->btf; struct bpf_struct_ops *st_ops; + const struct btf *kern_btf; + struct module_btf *mod_btf; void *data, *kern_data; const char *tname; int err; @@ -1007,16 +1017,19 @@ static int bpf_map__init_kern_struct_ops(struct bpf_map *map, st_ops = map->st_ops; type = st_ops->type; tname = st_ops->tname; - err = find_struct_ops_kern_types(kern_btf, tname, + err = find_struct_ops_kern_types(obj, tname, &mod_btf, &kern_type, &kern_type_id, &kern_vtype, &kern_vtype_id, &kern_data_member); if (err) return err; + kern_btf = mod_btf ? mod_btf->btf : obj->btf_vmlinux; + pr_debug("struct_ops init_kern %s: type_id:%u kern_type_id:%u kern_vtype_id:%u\n", map->name, st_ops->type_id, kern_type_id, kern_vtype_id); + map->mod_btf_fd = mod_btf ? mod_btf->fd : -1; map->def.value_size = kern_vtype->size; map->btf_vmlinux_value_type_id = kern_vtype_id; @@ -1092,6 +1105,8 @@ static int bpf_map__init_kern_struct_ops(struct bpf_map *map, return -ENOTSUP; } + if (mod_btf) + prog->attach_btf_obj_fd = mod_btf->fd; prog->attach_btf_id = kern_type_id; prog->expected_attach_type = kern_member_idx; @@ -1134,8 +1149,7 @@ static int bpf_object__init_kern_struct_ops_maps(struct bpf_object *obj) if (!bpf_map__is_struct_ops(map)) continue; - err = bpf_map__init_kern_struct_ops(map, obj->btf, - obj->btf_vmlinux); + err = bpf_map__init_kern_struct_ops(map); if (err) return err; } @@ -5129,8 +5143,13 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b create_attr.numa_node = map->numa_node; create_attr.map_extra = map->map_extra; - if (bpf_map__is_struct_ops(map)) + if (bpf_map__is_struct_ops(map)) { create_attr.btf_vmlinux_value_type_id = map->btf_vmlinux_value_type_id; + if (map->mod_btf_fd >= 0) { + create_attr.value_type_btf_obj_fd = map->mod_btf_fd; + create_attr.map_flags |= BPF_F_VTYPE_BTF_OBJ_FD; + } + } if (obj->btf && btf__fd(obj->btf) >= 0) { create_attr.btf_fd = btf__fd(obj->btf); @@ -9452,7 +9471,9 @@ static int libbpf_find_attach_btf_id(struct bpf_program *prog, const char *attac *btf_obj_fd = 0; *btf_type_id = 1; } else { - err = find_kernel_btf_id(prog->obj, attach_name, attach_type, btf_obj_fd, btf_type_id); + err = find_kernel_btf_id(prog->obj, attach_name, + attach_type, btf_obj_fd, + btf_type_id); } if (err) { pr_warn("prog '%s': failed to find kernel BTF type ID of '%s': %d\n", diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c index 9c4db90b92b6..98373d126d9d 100644 --- a/tools/lib/bpf/libbpf_probes.c +++ b/tools/lib/bpf/libbpf_probes.c @@ -326,6 +326,7 @@ static int probe_map_create(enum bpf_map_type map_type) case BPF_MAP_TYPE_STRUCT_OPS: /* we'll get -ENOTSUPP for invalid BTF type ID for struct_ops */ opts.btf_vmlinux_value_type_id = 1; + opts.value_type_btf_obj_fd = -1; exp_err = -524; /* -ENOTSUPP */ break; case BPF_MAP_TYPE_BLOOM_FILTER: -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 7c81c2490c73e614c6d48e4f339f4f224140b565 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- The module requires the use of btf_ctx_access() to invoke bpf_tracing_btf_ctx_access() from a module. This function is valuable for implementing validation functions that ensure proper access to ctx. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-14-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Conflicts: kernel/bpf/btf.c [Fix context conflicts.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/btf.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 1cd7e8fbcc57..db3e74481f65 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -6400,6 +6400,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type, } return true; } +EXPORT_SYMBOL_GPL(btf_ctx_access); enum bpf_struct_walk_result { /* < 0 error */ -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 0253e0590e2dc46996534371d56b5297099aed4e category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Create a new struct_ops type called bpf_testmod_ops within the bpf_testmod module. When a struct_ops object is registered, the bpf_testmod module will invoke test_2 from the module. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240119225005.668602-15-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/bpf_testmod/bpf_testmod.c | 66 ++++++++++++++++ .../selftests/bpf/bpf_testmod/bpf_testmod.h | 5 ++ .../bpf/prog_tests/test_struct_ops_module.c | 75 +++++++++++++++++++ .../selftests/bpf/progs/struct_ops_module.c | 30 ++++++++ 4 files changed, 176 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_module.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_module.c diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index 1b8589143bd3..ca863f436978 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -1,5 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright (c) 2020 Facebook */ +#include <linux/bpf.h> #include <linux/btf.h> #include <linux/btf_ids.h> #include <linux/delay.h> @@ -516,11 +517,75 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_test_static_unused_arg) BTF_ID_FLAGS(func, bpf_kfunc_call_test_offset) BTF_SET8_END(bpf_testmod_check_kfunc_ids) +static int bpf_testmod_ops_init(struct btf *btf) +{ + return 0; +} + +static bool bpf_testmod_ops_is_valid_access(int off, int size, + enum bpf_access_type type, + const struct bpf_prog *prog, + struct bpf_insn_access_aux *info) +{ + return bpf_tracing_btf_ctx_access(off, size, type, prog, info); +} + +static int bpf_testmod_ops_init_member(const struct btf_type *t, + const struct btf_member *member, + void *kdata, const void *udata) +{ + return 0; +} + static const struct btf_kfunc_id_set bpf_testmod_kfunc_set = { .owner = THIS_MODULE, .set = &bpf_testmod_check_kfunc_ids, }; +static const struct bpf_verifier_ops bpf_testmod_verifier_ops = { + .is_valid_access = bpf_testmod_ops_is_valid_access, +}; + +static int bpf_dummy_reg(void *kdata) +{ + struct bpf_testmod_ops *ops = kdata; + int r; + + r = ops->test_2(4, 3); + + return 0; +} + +static void bpf_dummy_unreg(void *kdata) +{ +} + +static int bpf_testmod_test_1(void) +{ + return 0; +} + +static int bpf_testmod_test_2(int a, int b) +{ + return 0; +} + +static struct bpf_testmod_ops __bpf_testmod_ops = { + .test_1 = bpf_testmod_test_1, + .test_2 = bpf_testmod_test_2, +}; + +struct bpf_struct_ops bpf_bpf_testmod_ops = { + .verifier_ops = &bpf_testmod_verifier_ops, + .init = bpf_testmod_ops_init, + .init_member = bpf_testmod_ops_init_member, + .reg = bpf_dummy_reg, + .unreg = bpf_dummy_unreg, + .cfi_stubs = &__bpf_testmod_ops, + .name = "bpf_testmod_ops", + .owner = THIS_MODULE, +}; + extern int bpf_fentry_test1(int a); static int bpf_testmod_init(void) @@ -531,6 +596,7 @@ static int bpf_testmod_init(void) ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_testmod_kfunc_set); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_testmod_kfunc_set); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_testmod_kfunc_set); + ret = ret ?: register_bpf_struct_ops(&bpf_bpf_testmod_ops, bpf_testmod_ops); if (ret < 0) return ret; if (bpf_fentry_test1(0) < 0) diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h index f32793efe095..ca5435751c79 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h @@ -28,4 +28,9 @@ struct bpf_iter_testmod_seq { int cnt; }; +struct bpf_testmod_ops { + int (*test_1)(void); + int (*test_2)(int a, int b); +}; + #endif /* _BPF_TESTMOD_H */ diff --git a/tools/testing/selftests/bpf/prog_tests/test_struct_ops_module.c b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_module.c new file mode 100644 index 000000000000..8d833f0c7580 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_module.c @@ -0,0 +1,75 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include <test_progs.h> +#include <time.h> + +#include "struct_ops_module.skel.h" + +static void check_map_info(struct bpf_map_info *info) +{ + struct bpf_btf_info btf_info; + char btf_name[256]; + u32 btf_info_len = sizeof(btf_info); + int err, fd; + + fd = bpf_btf_get_fd_by_id(info->btf_vmlinux_id); + if (!ASSERT_GE(fd, 0, "get_value_type_btf_obj_fd")) + return; + + memset(&btf_info, 0, sizeof(btf_info)); + btf_info.name = ptr_to_u64(btf_name); + btf_info.name_len = sizeof(btf_name); + err = bpf_btf_get_info_by_fd(fd, &btf_info, &btf_info_len); + if (!ASSERT_OK(err, "get_value_type_btf_obj_info")) + goto cleanup; + + if (!ASSERT_EQ(strcmp(btf_name, "bpf_testmod"), 0, "get_value_type_btf_obj_name")) + goto cleanup; + +cleanup: + close(fd); +} + +static void test_struct_ops_load(void) +{ + DECLARE_LIBBPF_OPTS(bpf_object_open_opts, opts); + struct struct_ops_module *skel; + struct bpf_map_info info = {}; + struct bpf_link *link; + int err; + u32 len; + + skel = struct_ops_module__open_opts(&opts); + if (!ASSERT_OK_PTR(skel, "struct_ops_module_open")) + return; + + err = struct_ops_module__load(skel); + if (!ASSERT_OK(err, "struct_ops_module_load")) + goto cleanup; + + len = sizeof(info); + err = bpf_map_get_info_by_fd(bpf_map__fd(skel->maps.testmod_1), &info, + &len); + if (!ASSERT_OK(err, "bpf_map_get_info_by_fd")) + goto cleanup; + + link = bpf_map__attach_struct_ops(skel->maps.testmod_1); + ASSERT_OK_PTR(link, "attach_test_mod_1"); + + /* test_2() will be called from bpf_dummy_reg() in bpf_testmod.c */ + ASSERT_EQ(skel->bss->test_2_result, 7, "test_2_result"); + + bpf_link__destroy(link); + + check_map_info(&info); + +cleanup: + struct_ops_module__destroy(skel); +} + +void serial_test_struct_ops_module(void) +{ + if (test__start_subtest("test_struct_ops_load")) + test_struct_ops_load(); +} + diff --git a/tools/testing/selftests/bpf/progs/struct_ops_module.c b/tools/testing/selftests/bpf/progs/struct_ops_module.c new file mode 100644 index 000000000000..e44ac55195ca --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_module.c @@ -0,0 +1,30 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include <vmlinux.h> +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +int test_2_result = 0; + +SEC("struct_ops/test_1") +int BPF_PROG(test_1) +{ + return 0xdeadbeef; +} + +SEC("struct_ops/test_2") +int BPF_PROG(test_2, int a, int b) +{ + test_2_result = a + b; + return a + b; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_1 = { + .test_1 = (void *)test_1, + .test_2 = (void *)test_2, +}; + -- 2.34.1
From: Martin KaFai Lau <martin.lau@kernel.org> mainline inclusion from mainline-v6.9-rc1 commit c9f115564561af63db662791e9a35fcf1dfefd2a category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- The commit 9e926acda0c2 ("libbpf: Find correct module BTFs for struct_ops maps and progs.") sets a newly added field (value_type_btf_obj_fd) to -1 in libbpf when the caller of the libbpf's bpf_map_create did not define this field by passing a NULL "opts" or passing in a "opts" that does not cover this new field. OPT_HAS(opts, field) is used to decide if the field is defined or not: ((opts) && opts->sz >= offsetofend(typeof(*(opts)), field)) Once OPTS_HAS decided the field is not defined, that field should be set to 0. For this particular new field (value_type_btf_obj_fd), its corresponding map_flags "BPF_F_VTYPE_BTF_OBJ_FD" is not set. Thus, the kernel does not treat it as an fd field. Fixes: 9e926acda0c2 ("libbpf: Find correct module BTFs for struct_ops maps and progs.") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240124224418.2905133-1-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/bpf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index 6a8b495b53f5..04ccbd5b2d0d 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -194,7 +194,7 @@ int bpf_map_create(enum bpf_map_type map_type, attr.btf_key_type_id = OPTS_GET(opts, btf_key_type_id, 0); attr.btf_value_type_id = OPTS_GET(opts, btf_value_type_id, 0); attr.btf_vmlinux_value_type_id = OPTS_GET(opts, btf_vmlinux_value_type_id, 0); - attr.value_type_btf_obj_fd = OPTS_GET(opts, value_type_btf_obj_fd, -1); + attr.value_type_btf_obj_fd = OPTS_GET(opts, value_type_btf_obj_fd, 0); attr.inner_map_fd = OPTS_GET(opts, inner_map_fd, 0); attr.map_flags = OPTS_GET(opts, map_flags, 0); -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit e6be8cd5d3cf54ccd0ae66027d6f4697b15f4c3e category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- In bpf_struct_ops_map_alloc, it needs to check for NULL in the returned pointer of bpf_get_btf_vmlinux() when CONFIG_DEBUG_INFO_BTF is not set. ENOTSUPP is used to preserve the same behavior before the struct_ops kmod support. In the function check_struct_ops_btf_id(), instead of redoing the bpf_get_btf_vmlinux() that has already been done in syscall.c, the fix here is to check for prog->aux->attach_btf_id. BPF_PROG_TYPE_STRUCT_OPS must require attach_btf_id and syscall.c guarantees a valid attach_btf as long as attach_btf_id is set. When attach_btf_id is not set, this patch returns -ENOTSUPP because it is what the selftest in test_libbpf_probe_prog_types() and libbpf_probes.c are expecting for feature probing purpose. Changes from v1: - Remove an unnecessary NULL check in check_struct_ops_btf_id() Reported-by: syzbot+88f0aafe5f950d7489d7@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/00000000000040d68a060fc8db8c@google.com/ Reported-by: syzbot+1336f3d4b10bcda75b89@syzkaller.appspotmail.com Closes: https://lore.kernel.org/bpf/00000000000026353b060fc21c07@google.com/ Fixes: fcc2c1fb0651 ("bpf: pass attached BTF to the bpf_struct_ops subsystem") Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240126023113.1379504-1-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/bpf_struct_ops.c | 2 ++ kernel/bpf/verifier.c | 5 ++++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index defc052e4622..0decd862dfe0 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -669,6 +669,8 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) btf = bpf_get_btf_vmlinux(); if (IS_ERR(btf)) return ERR_CAST(btf); + if (!btf) + return ERR_PTR(-ENOTSUPP); } st_ops_desc = bpf_struct_ops_find_value(btf, attr->btf_vmlinux_value_type_id); diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index a3df4861690a..32ba0b623ef2 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -20221,7 +20221,10 @@ static int check_struct_ops_btf_id(struct bpf_verifier_env *env) return -EINVAL; } - btf = prog->aux->attach_btf ?: bpf_get_btf_vmlinux(); + if (!prog->aux->attach_btf_id) + return -ENOTSUPP; + + btf = prog->aux->attach_btf; if (btf_is_module(btf)) { /* Make sure st_ops is valid through the lifetime of env */ env->attach_btf_mod = btf_try_get_module(btf); -- 2.34.1
From: Daniel Xu <dxu@dxuuu.xyz> mainline inclusion from mainline-v6.9-rc1 commit 79b47344bbc5a693a92ed6b2b09dac59254bfac8 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- This commit adds support for flags on BTF_SET8s. struct btf_id_set8 already supported 32 bits worth of flags, but was only used for alignment purposes before. We now use these bits to encode flags. The first use case is tagging kfunc sets with a flag so that pahole can recognize which BTF_ID_FLAGS(func, ..) are actual kfuncs. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/7bb152ec76d6c2c930daec88e995bf18484a5ebb.170649139... Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/btf_ids.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h index a9cb10b0e2e9..dca09b7f21dc 100644 --- a/include/linux/btf_ids.h +++ b/include/linux/btf_ids.h @@ -21,6 +21,7 @@ struct btf_id_set8 { #include <linux/compiler.h> /* for __PASTE */ #include <linux/compiler_attributes.h> /* for __maybe_unused */ +#include <linux/stringify.h> /* * Following macros help to define lists of BTF IDs placed @@ -183,17 +184,18 @@ extern struct btf_id_set name; * .word (1 << 3) | (1 << 1) | (1 << 2) * */ -#define __BTF_SET8_START(name, scope) \ +#define __BTF_SET8_START(name, scope, flags) \ +__BTF_ID_LIST(name, local) \ asm( \ ".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ "." #scope " __BTF_ID__set8__" #name "; \n" \ "__BTF_ID__set8__" #name ":; \n" \ -".zero 8 \n" \ +".zero 4 \n" \ +".long " __stringify(flags) "\n" \ ".popsection; \n"); #define BTF_SET8_START(name) \ -__BTF_ID_LIST(name, local) \ -__BTF_SET8_START(name, local) +__BTF_SET8_START(name, local, 0) #define BTF_SET8_END(name) \ asm( \ -- 2.34.1
From: Daniel Xu <dxu@dxuuu.xyz> mainline inclusion from mainline-v6.9-rc1 commit a05e90427ef6706f59188b379ad6366b9d298bc5 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- This macro pair is functionally equivalent to BTF_SET8_START/END, except with BTF_SET8_KFUNCS flag set in the btf_id_set8 flags field. The next commit will codemod all kfunc set8s to this new variant such that all kfuncs are tagged as such in .BTF_ids section. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/d536c57c7c2af428686853cc7396b7a44faa53b7.170649139... Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/btf_ids.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h index dca09b7f21dc..e24aabfe8ecc 100644 --- a/include/linux/btf_ids.h +++ b/include/linux/btf_ids.h @@ -8,6 +8,9 @@ struct btf_id_set { u32 ids[]; }; +/* This flag implies BTF_SET8 holds kfunc(s) */ +#define BTF_SET8_KFUNCS (1 << 0) + struct btf_id_set8 { u32 cnt; u32 flags; @@ -204,6 +207,12 @@ asm( \ ".popsection; \n"); \ extern struct btf_id_set8 name; +#define BTF_KFUNCS_START(name) \ +__BTF_SET8_START(name, local, BTF_SET8_KFUNCS) + +#define BTF_KFUNCS_END(name) \ +BTF_SET8_END(name) + #else #define BTF_ID_LIST(name) static u32 __maybe_unused name[64]; @@ -218,6 +227,8 @@ extern struct btf_id_set8 name; #define BTF_SET_END(name) #define BTF_SET8_START(name) static struct btf_id_set8 __maybe_unused name = { 0 }; #define BTF_SET8_END(name) +#define BTF_KFUNCS_START(name) static struct btf_id_set8 __maybe_unused name = { .flags = BTF_SET8_KFUNCS }; +#define BTF_KFUNCS_END(name) #endif /* CONFIG_DEBUG_INFO_BTF */ -- 2.34.1
From: Daniel Xu <dxu@dxuuu.xyz> mainline inclusion from mainline-v6.9-rc1 commit 6f3189f38a3e995232e028a4c341164c4aca1b20 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- This commit marks kfuncs as such inside the .BTF_ids section. The upshot of these annotations is that we'll be able to automatically generate kfunc prototypes for downstream users. The process is as follows: 1. In source, use BTF_KFUNCS_START/END macro pair to mark kfuncs 2. During build, pahole injects into BTF a "bpf_kfunc" BTF_DECL_TAG for each function inside BTF_KFUNCS sets 3. At runtime, vmlinux or module BTF is made available in sysfs 4. At runtime, bpftool (or similar) can look at provided BTF and generate appropriate prototypes for functions with "bpf_kfunc" tag To ensure future kfunc are similarly tagged, we now also return error inside kfunc registration for untagged kfuncs. For vmlinux kfuncs, we also WARN(), as initcall machinery does not handle errors. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Acked-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/e55150ceecbf0a5d961e608941165c0bee7bc943.170649139... Signed-off-by: Alexei Starovoitov <ast@kernel.org> Conflicts: fs/verity/measure.c kernel/bpf/cpumask.c kernel/bpf/helpers.c kernel/cgroup/rstat.c kernel/trace/bpf_trace.c net/core/filter.c net/core/xdp.c tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c arch/arm64/kernel/bpf-rvi.c block/blk-cgroup.c drivers/hid/bpf/hid_bpf_dispatch.c fs/proc/stat.c kernel/bpf-rvi/common_kfuncs.c kernel/cgroup/cpuset.c kernel/sched/bpf_sched.c kernel/sched/cpuacct.c [Fix conflicts because of missing some kfuns or context diff. And USE KFUNCS for self-dev kfuncs.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- Documentation/bpf/kfuncs.rst | 8 ++++---- arch/arm64/kernel/bpf-rvi.c | 4 ++-- block/blk-cgroup.c | 4 ++-- drivers/hid/bpf/hid_bpf_dispatch.c | 12 +++++------ fs/proc/stat.c | 4 ++-- kernel/bpf-rvi/common_kfuncs.c | 4 ++-- kernel/bpf/btf.c | 8 ++++++++ kernel/bpf/cpumask.c | 4 ++-- kernel/bpf/helpers.c | 8 ++++---- kernel/bpf/map_iter.c | 4 ++-- kernel/cgroup/cpuset.c | 4 ++-- kernel/cgroup/rstat.c | 4 ++-- kernel/sched/bpf_sched.c | 8 ++++---- kernel/sched/cpuacct.c | 4 ++-- kernel/trace/bpf_trace.c | 4 ++-- net/bpf/test_run.c | 8 ++++---- net/core/filter.c | 20 +++++++++---------- net/core/xdp.c | 4 ++-- net/ipv4/bpf_tcp_ca.c | 4 ++-- net/ipv4/fou_bpf.c | 4 ++-- net/ipv4/tcp_bbr.c | 4 ++-- net/ipv4/tcp_cubic.c | 4 ++-- net/ipv4/tcp_dctcp.c | 4 ++-- net/netfilter/nf_conntrack_bpf.c | 4 ++-- net/netfilter/nf_nat_bpf.c | 4 ++-- net/xfrm/xfrm_interface_bpf.c | 4 ++-- .../selftests/bpf/bpf_testmod/bpf_testmod.c | 8 ++++---- 27 files changed, 82 insertions(+), 74 deletions(-) diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst index 723408e399ab..18920610ab7c 100644 --- a/Documentation/bpf/kfuncs.rst +++ b/Documentation/bpf/kfuncs.rst @@ -153,10 +153,10 @@ In addition to kfuncs' arguments, verifier may need more information about the type of kfunc(s) being registered with the BPF subsystem. To do so, we define flags on a set of kfuncs as follows:: - BTF_SET8_START(bpf_task_set) + BTF_KFUNCS_START(bpf_task_set) BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE) - BTF_SET8_END(bpf_task_set) + BTF_KFUNCS_END(bpf_task_set) This set encodes the BTF ID of each kfunc listed above, and encodes the flags along with it. Ofcourse, it is also allowed to specify no flags. @@ -323,10 +323,10 @@ Once the kfunc is prepared for use, the final step to making it visible is registering it with the BPF subsystem. Registration is done per BPF program type. An example is shown below:: - BTF_SET8_START(bpf_task_set) + BTF_KFUNCS_START(bpf_task_set) BTF_ID_FLAGS(func, bpf_get_task_pid, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_put_pid, KF_RELEASE) - BTF_SET8_END(bpf_task_set) + BTF_KFUNCS_END(bpf_task_set) static const struct btf_kfunc_id_set bpf_task_kfunc_set = { .owner = THIS_MODULE, diff --git a/arch/arm64/kernel/bpf-rvi.c b/arch/arm64/kernel/bpf-rvi.c index 1ff460b46c85..026822aab85e 100644 --- a/arch/arm64/kernel/bpf-rvi.c +++ b/arch/arm64/kernel/bpf-rvi.c @@ -45,10 +45,10 @@ __bpf_kfunc const char *bpf_arch_flags(enum arch_flags_type t, int i) } } -BTF_SET8_START(bpf_arm64_kfunc_ids) +BTF_KFUNCS_START(bpf_arm64_kfunc_ids) BTF_ID_FLAGS(func, bpf_arm64_cpu_have_feature, KF_RCU) BTF_ID_FLAGS(func, bpf_arch_flags) -BTF_SET8_END(bpf_arm64_kfunc_ids) +BTF_KFUNCS_END(bpf_arm64_kfunc_ids) static const struct btf_kfunc_id_set bpf_arm64_kfunc_set = { .owner = THIS_MODULE, diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index eadf78bdcc40..ed387265f996 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -2342,9 +2342,9 @@ __bpf_kfunc void bpf_blkcg_get_dev_iostat(struct blkcg *blkcg, int major, int mi rcu_read_unlock(); } -BTF_SET8_START(bpf_blkcg_kfunc_ids) +BTF_KFUNCS_START(bpf_blkcg_kfunc_ids) BTF_ID_FLAGS(func, bpf_blkcg_get_dev_iostat) -BTF_SET8_END(bpf_blkcg_kfunc_ids) +BTF_KFUNCS_END(bpf_blkcg_kfunc_ids) static const struct btf_kfunc_id_set bpf_blkcg_kfunc_set = { .owner = THIS_MODULE, diff --git a/drivers/hid/bpf/hid_bpf_dispatch.c b/drivers/hid/bpf/hid_bpf_dispatch.c index 7903c8638e81..5fa53d0bea58 100644 --- a/drivers/hid/bpf/hid_bpf_dispatch.c +++ b/drivers/hid/bpf/hid_bpf_dispatch.c @@ -172,9 +172,9 @@ hid_bpf_get_data(struct hid_bpf_ctx *ctx, unsigned int offset, const size_t rdwr * The following set contains all functions we agree BPF programs * can use. */ -BTF_SET8_START(hid_bpf_kfunc_ids) +BTF_KFUNCS_START(hid_bpf_kfunc_ids) BTF_ID_FLAGS(func, hid_bpf_get_data, KF_RET_NULL) -BTF_SET8_END(hid_bpf_kfunc_ids) +BTF_KFUNCS_END(hid_bpf_kfunc_ids) static const struct btf_kfunc_id_set hid_bpf_kfunc_set = { .owner = THIS_MODULE, @@ -467,11 +467,11 @@ hid_bpf_hw_request(struct hid_bpf_ctx *ctx, __u8 *buf, size_t buf__sz, } /* our HID-BPF entrypoints */ -BTF_SET8_START(hid_bpf_fmodret_ids) +BTF_KFUNCS_START(hid_bpf_fmodret_ids) BTF_ID_FLAGS(func, hid_bpf_device_event) BTF_ID_FLAGS(func, hid_bpf_rdesc_fixup) BTF_ID_FLAGS(func, __hid_bpf_tail_call) -BTF_SET8_END(hid_bpf_fmodret_ids) +BTF_KFUNCS_END(hid_bpf_fmodret_ids) static const struct btf_kfunc_id_set hid_bpf_fmodret_set = { .owner = THIS_MODULE, @@ -479,12 +479,12 @@ static const struct btf_kfunc_id_set hid_bpf_fmodret_set = { }; /* for syscall HID-BPF */ -BTF_SET8_START(hid_bpf_syscall_kfunc_ids) +BTF_KFUNCS_START(hid_bpf_syscall_kfunc_ids) BTF_ID_FLAGS(func, hid_bpf_attach_prog) BTF_ID_FLAGS(func, hid_bpf_allocate_context, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, hid_bpf_release_context, KF_RELEASE) BTF_ID_FLAGS(func, hid_bpf_hw_request) -BTF_SET8_END(hid_bpf_syscall_kfunc_ids) +BTF_KFUNCS_END(hid_bpf_syscall_kfunc_ids) static const struct btf_kfunc_id_set hid_bpf_syscall_kfunc_set = { .owner = THIS_MODULE, diff --git a/fs/proc/stat.c b/fs/proc/stat.c index 4757e5b1be38..dc21a2c9a25d 100644 --- a/fs/proc/stat.c +++ b/fs/proc/stat.c @@ -239,11 +239,11 @@ __bpf_kfunc void bpf_show_all_irqs(struct seq_file *p) show_all_irqs(p); } -BTF_SET8_START(bpf_proc_stat_kfunc_ids) +BTF_KFUNCS_START(bpf_proc_stat_kfunc_ids) BTF_ID_FLAGS(func, bpf_get_idle_time) BTF_ID_FLAGS(func, bpf_get_iowait_time) BTF_ID_FLAGS(func, bpf_show_all_irqs) -BTF_SET8_END(bpf_proc_stat_kfunc_ids) +BTF_KFUNCS_END(bpf_proc_stat_kfunc_ids) static const struct btf_kfunc_id_set bpf_proc_stat_kfunc_set = { .owner = THIS_MODULE, diff --git a/kernel/bpf-rvi/common_kfuncs.c b/kernel/bpf-rvi/common_kfuncs.c index b0f33fd0201d..d790a7b65da6 100644 --- a/kernel/bpf-rvi/common_kfuncs.c +++ b/kernel/bpf-rvi/common_kfuncs.c @@ -280,7 +280,7 @@ __bpf_kfunc void bpf_x86_direct_pages(unsigned long *p) } #endif -BTF_SET8_START(bpf_common_kfuncs_ids) +BTF_KFUNCS_START(bpf_common_kfuncs_ids) BTF_ID_FLAGS(func, bpf_mem_cgroup_from_task, KF_RET_NULL | KF_RCU) BTF_ID_FLAGS(func, bpf_task_active_pid_ns, KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_pidns_nr_tasks) @@ -309,7 +309,7 @@ BTF_ID_FLAGS(func, bpf_mem_committed) BTF_ID_FLAGS(func, bpf_mem_vmalloc_used) BTF_ID_FLAGS(func, bpf_mem_vmalloc_total) BTF_ID_FLAGS(func, bpf_x86_direct_pages, KF_TRUSTED_ARGS) -BTF_SET8_END(bpf_common_kfuncs_ids) +BTF_KFUNCS_END(bpf_common_kfuncs_ids) static const struct btf_kfunc_id_set bpf_common_kfuncs_set = { .owner = THIS_MODULE, diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index db3e74481f65..2f8cfd27bd9e 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -8230,6 +8230,14 @@ int register_btf_kfunc_id_set(enum bpf_prog_type prog_type, { enum btf_kfunc_hook hook; + /* All kfuncs need to be tagged as such in BTF. + * WARN() for initcall registrations that do not check errors. + */ + if (!(kset->set->flags & BTF_SET8_KFUNCS)) { + WARN_ON(!kset->owner); + return -EINVAL; + } + hook = bpf_prog_type_to_kfunc_hook(prog_type); return __register_btf_kfunc_id_set(hook, kset); } diff --git a/kernel/bpf/cpumask.c b/kernel/bpf/cpumask.c index 6acecc8ebd61..317f74895370 100644 --- a/kernel/bpf/cpumask.c +++ b/kernel/bpf/cpumask.c @@ -413,7 +413,7 @@ __bpf_kfunc u32 bpf_cpumask_any_and_distribute(const struct cpumask *src1, __bpf_kfunc_end_defs(); -BTF_SET8_START(cpumask_kfunc_btf_ids) +BTF_KFUNCS_START(cpumask_kfunc_btf_ids) BTF_ID_FLAGS(func, bpf_cpumask_create, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_cpumask_release, KF_RELEASE) BTF_ID_FLAGS(func, bpf_cpumask_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS) @@ -438,7 +438,7 @@ BTF_ID_FLAGS(func, bpf_cpumask_full, KF_RCU) BTF_ID_FLAGS(func, bpf_cpumask_copy, KF_RCU) BTF_ID_FLAGS(func, bpf_cpumask_any_distribute, KF_RCU) BTF_ID_FLAGS(func, bpf_cpumask_any_and_distribute, KF_RCU) -BTF_SET8_END(cpumask_kfunc_btf_ids) +BTF_KFUNCS_END(cpumask_kfunc_btf_ids) static const struct btf_kfunc_id_set cpumask_kfunc_set = { .owner = THIS_MODULE, diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 0edec71be0a7..7b79779126a0 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2627,7 +2627,7 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) __bpf_kfunc_end_defs(); -BTF_SET8_START(generic_btf_ids) +BTF_KFUNCS_START(generic_btf_ids) #ifdef CONFIG_KEXEC_CORE BTF_ID_FLAGS(func, crash_kexec, KF_DESTRUCTIVE) #endif @@ -2655,7 +2655,7 @@ BTF_ID_FLAGS(func, bpf_task_from_pid, KF_ACQUIRE | KF_RET_NULL) #ifdef CONFIG_BPF_RVI BTF_ID_FLAGS(func, bpf_current_level1_reaper, KF_ACQUIRE | KF_RET_NULL) #endif -BTF_SET8_END(generic_btf_ids) +BTF_KFUNCS_END(generic_btf_ids) static const struct btf_kfunc_id_set generic_kfunc_set = { .owner = THIS_MODULE, @@ -2671,7 +2671,7 @@ BTF_ID(struct, cgroup) BTF_ID(func, bpf_cgroup_release_dtor) #endif -BTF_SET8_START(common_btf_ids) +BTF_KFUNCS_START(common_btf_ids) BTF_ID_FLAGS(func, bpf_cast_to_kern_ctx) BTF_ID_FLAGS(func, bpf_rdonly_cast) BTF_ID_FLAGS(func, bpf_rcu_read_lock) @@ -2700,7 +2700,7 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_null) BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly) BTF_ID_FLAGS(func, bpf_dynptr_size) BTF_ID_FLAGS(func, bpf_dynptr_clone) -BTF_SET8_END(common_btf_ids) +BTF_KFUNCS_END(common_btf_ids) static const struct btf_kfunc_id_set common_kfunc_set = { .owner = THIS_MODULE, diff --git a/kernel/bpf/map_iter.c b/kernel/bpf/map_iter.c index 6abd7c5df4b3..9575314f40a6 100644 --- a/kernel/bpf/map_iter.c +++ b/kernel/bpf/map_iter.c @@ -213,9 +213,9 @@ __bpf_kfunc s64 bpf_map_sum_elem_count(const struct bpf_map *map) __bpf_kfunc_end_defs(); -BTF_SET8_START(bpf_map_iter_kfunc_ids) +BTF_KFUNCS_START(bpf_map_iter_kfunc_ids) BTF_ID_FLAGS(func, bpf_map_sum_elem_count, KF_TRUSTED_ARGS) -BTF_SET8_END(bpf_map_iter_kfunc_ids) +BTF_KFUNCS_END(bpf_map_iter_kfunc_ids) static const struct btf_kfunc_id_set bpf_map_iter_kfunc_set = { .owner = THIS_MODULE, diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index dd8d502fec45..5222bf813c91 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -5250,10 +5250,10 @@ __bpf_kfunc unsigned int bpf_cpumask_weight(struct cpumask *pmask) return cpumask_weight(pmask); } -BTF_SET8_START(bpf_cpuset_kfunc_ids) +BTF_KFUNCS_START(bpf_cpuset_kfunc_ids) BTF_ID_FLAGS(func, bpf_cpuset_from_task, KF_RET_NULL | KF_RCU) BTF_ID_FLAGS(func, bpf_cpumask_weight) -BTF_SET8_END(bpf_cpuset_kfunc_ids) +BTF_KFUNCS_END(bpf_cpuset_kfunc_ids) static const struct btf_kfunc_id_set bpf_cpuset_kfunc_set = { .owner = THIS_MODULE, diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index c6d6bec9bcce..b589868cc85c 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -534,13 +534,13 @@ void cgroup_base_stat_cputime_show(struct seq_file *seq) } /* Add bpf kfuncs for cgroup_rstat_updated() and cgroup_rstat_flush() */ -BTF_SET8_START(bpf_rstat_kfunc_ids) +BTF_KFUNCS_START(bpf_rstat_kfunc_ids) BTF_ID_FLAGS(func, cgroup_rstat_updated) BTF_ID_FLAGS(func, cgroup_rstat_flush, KF_SLEEPABLE) #if defined(CONFIG_BPF_RVI) && !defined(CONFIG_PREEMPT_RT) BTF_ID_FLAGS(func, cgroup_rstat_flush_atomic) #endif -BTF_SET8_END(bpf_rstat_kfunc_ids) +BTF_KFUNCS_END(bpf_rstat_kfunc_ids) static const struct btf_kfunc_id_set bpf_rstat_kfunc_set = { .owner = THIS_MODULE, diff --git a/kernel/sched/bpf_sched.c b/kernel/sched/bpf_sched.c index 6849a58439b2..1039f20198aa 100644 --- a/kernel/sched/bpf_sched.c +++ b/kernel/sched/bpf_sched.c @@ -167,12 +167,12 @@ __bpf_kfunc s32 bpf_sched_cpu_stats_of(int cpuid, __diag_pop(); -BTF_SET8_START(sched_cpustats_kfunc_btf_ids) +BTF_KFUNCS_START(sched_cpustats_kfunc_btf_ids) BTF_ID_FLAGS(func, bpf_sched_cpustats_create, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_sched_cpustats_release, KF_RELEASE) BTF_ID_FLAGS(func, bpf_sched_cpustats_acquire, KF_ACQUIRE | KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_sched_cpu_stats_of, KF_RCU) -BTF_SET8_END(sched_cpustats_kfunc_btf_ids) +BTF_KFUNCS_END(sched_cpustats_kfunc_btf_ids) static const struct btf_kfunc_id_set cpustats_kfunc_set = { .owner = THIS_MODULE, @@ -221,12 +221,12 @@ __bpf_kfunc int bpf_sched_set_task_prefer_nid(struct task_struct *task, int nid) return set_prefer_cpus_ptr(task, cpumask_of_node(nid)); } -BTF_SET8_START(sched_task_kfunc_btf_ids) +BTF_KFUNCS_START(sched_task_kfunc_btf_ids) BTF_ID_FLAGS(func, bpf_sched_entity_is_task) BTF_ID_FLAGS(func, bpf_sched_entity_to_task) BTF_ID_FLAGS(func, bpf_sched_tag_of_entity) BTF_ID_FLAGS(func, bpf_sched_set_task_prefer_nid) -BTF_SET8_END(sched_task_kfunc_btf_ids) +BTF_KFUNCS_END(sched_task_kfunc_btf_ids) static const struct btf_kfunc_id_set sched_task_kfunc_set = { .owner = THIS_MODULE, diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 801f3df9734c..c479a4b9ce77 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -427,10 +427,10 @@ __bpf_kfunc void bpf_cpuacct_kcpustat_cpu_fetch(struct kernel_cpustat *dst, memcpy(dst, per_cpu_ptr(ca->cpustat, cpu), sizeof(struct kernel_cpustat)); } -BTF_SET8_START(bpf_cpuacct_kfunc_ids) +BTF_KFUNCS_START(bpf_cpuacct_kfunc_ids) BTF_ID_FLAGS(func, bpf_task_ca_cpuusage) BTF_ID_FLAGS(func, bpf_cpuacct_kcpustat_cpu_fetch) -BTF_SET8_END(bpf_cpuacct_kfunc_ids) +BTF_KFUNCS_END(bpf_cpuacct_kfunc_ids) static const struct btf_kfunc_id_set bpf_cpuacct_kfunc_set = { .owner = THIS_MODULE, diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 357c7febd8ba..fe21203dbc31 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1408,14 +1408,14 @@ __bpf_kfunc int bpf_verify_pkcs7_signature(struct bpf_dynptr_kern *data_ptr, __bpf_kfunc_end_defs(); -BTF_SET8_START(key_sig_kfunc_set) +BTF_KFUNCS_START(key_sig_kfunc_set) BTF_ID_FLAGS(func, bpf_lookup_user_key, KF_ACQUIRE | KF_RET_NULL | KF_SLEEPABLE) BTF_ID_FLAGS(func, bpf_lookup_system_key, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_key_put, KF_RELEASE) #ifdef CONFIG_SYSTEM_DATA_VERIFICATION BTF_ID_FLAGS(func, bpf_verify_pkcs7_signature, KF_SLEEPABLE) #endif -BTF_SET8_END(key_sig_kfunc_set) +BTF_KFUNCS_END(key_sig_kfunc_set) static const struct btf_kfunc_id_set bpf_key_sig_kfunc_set = { .owner = THIS_MODULE, diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 63a875c638b6..a343143c7888 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -618,21 +618,21 @@ CFI_NOSEAL(bpf_kfunc_call_memb_release_dtor); __bpf_kfunc_end_defs(); -BTF_SET8_START(bpf_test_modify_return_ids) +BTF_KFUNCS_START(bpf_test_modify_return_ids) BTF_ID_FLAGS(func, bpf_modify_return_test) BTF_ID_FLAGS(func, bpf_modify_return_test2) BTF_ID_FLAGS(func, bpf_fentry_test1, KF_SLEEPABLE) -BTF_SET8_END(bpf_test_modify_return_ids) +BTF_KFUNCS_END(bpf_test_modify_return_ids) static const struct btf_kfunc_id_set bpf_test_modify_return_set = { .owner = THIS_MODULE, .set = &bpf_test_modify_return_ids, }; -BTF_SET8_START(test_sk_check_kfunc_ids) +BTF_KFUNCS_START(test_sk_check_kfunc_ids) BTF_ID_FLAGS(func, bpf_kfunc_call_test_release, KF_RELEASE) BTF_ID_FLAGS(func, bpf_kfunc_call_memb_release, KF_RELEASE) -BTF_SET8_END(test_sk_check_kfunc_ids) +BTF_KFUNCS_END(test_sk_check_kfunc_ids) static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size, u32 size, u32 headroom, u32 tailroom) diff --git a/net/core/filter.c b/net/core/filter.c index fe54ad9b0e8f..d5a88790db2f 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -12272,27 +12272,27 @@ int bpf_dynptr_from_skb_rdonly(struct sk_buff *skb, u64 flags, return 0; } -BTF_SET8_START(bpf_kfunc_check_set_skb) +BTF_KFUNCS_START(bpf_kfunc_check_set_skb) BTF_ID_FLAGS(func, bpf_dynptr_from_skb) -BTF_SET8_END(bpf_kfunc_check_set_skb) +BTF_KFUNCS_END(bpf_kfunc_check_set_skb) -BTF_SET8_START(bpf_kfunc_check_set_xdp) +BTF_KFUNCS_START(bpf_kfunc_check_set_xdp) BTF_ID_FLAGS(func, bpf_dynptr_from_xdp) -BTF_SET8_END(bpf_kfunc_check_set_xdp) +BTF_KFUNCS_END(bpf_kfunc_check_set_xdp) -BTF_SET8_START(bpf_kfunc_check_set_sock_addr) +BTF_KFUNCS_START(bpf_kfunc_check_set_sock_addr) BTF_ID_FLAGS(func, bpf_sock_addr_set_sun_path) -BTF_SET8_END(bpf_kfunc_check_set_sock_addr) +BTF_KFUNCS_END(bpf_kfunc_check_set_sock_addr) #ifdef CONFIG_HISOCK -BTF_SET8_START(bpf_kfunc_check_set_hisock) +BTF_KFUNCS_START(bpf_kfunc_check_set_hisock) BTF_ID_FLAGS(func, bpf_set_ingress_dst) BTF_ID_FLAGS(func, bpf_get_skb_ethhdr) BTF_ID_FLAGS(func, bpf_set_ingress_dev) BTF_ID_FLAGS(func, bpf_set_egress_dev) BTF_ID_FLAGS(func, bpf_handle_ingress_ptype) BTF_ID_FLAGS(func, bpf_handle_egress_ptype) -BTF_SET8_END(bpf_kfunc_check_set_hisock) +BTF_KFUNCS_END(bpf_kfunc_check_set_hisock) #endif static const struct btf_kfunc_id_set bpf_kfunc_set_skb = { @@ -12376,9 +12376,9 @@ __bpf_kfunc int bpf_sock_destroy(struct sock_common *sock) __bpf_kfunc_end_defs(); -BTF_SET8_START(bpf_sk_iter_kfunc_ids) +BTF_KFUNCS_START(bpf_sk_iter_kfunc_ids) BTF_ID_FLAGS(func, bpf_sock_destroy, KF_TRUSTED_ARGS) -BTF_SET8_END(bpf_sk_iter_kfunc_ids) +BTF_KFUNCS_END(bpf_sk_iter_kfunc_ids) static int tracing_iter_filter(const struct bpf_prog *prog, u32 kfunc_id) { diff --git a/net/core/xdp.c b/net/core/xdp.c index 1642222e350b..39738e00e732 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -734,11 +734,11 @@ __bpf_kfunc int bpf_xdp_metadata_rx_hash(const struct xdp_md *ctx, u32 *hash, __bpf_kfunc_end_defs(); -BTF_SET8_START(xdp_metadata_kfunc_ids) +BTF_KFUNCS_START(xdp_metadata_kfunc_ids) #define XDP_METADATA_KFUNC(_, name) BTF_ID_FLAGS(func, name, KF_TRUSTED_ARGS) XDP_METADATA_KFUNC_xxx #undef XDP_METADATA_KFUNC -BTF_SET8_END(xdp_metadata_kfunc_ids) +BTF_KFUNCS_END(xdp_metadata_kfunc_ids) static const struct btf_kfunc_id_set xdp_metadata_kfunc_set = { .owner = THIS_MODULE, diff --git a/net/ipv4/bpf_tcp_ca.c b/net/ipv4/bpf_tcp_ca.c index 8e7716256d3c..5d83d7b058fa 100644 --- a/net/ipv4/bpf_tcp_ca.c +++ b/net/ipv4/bpf_tcp_ca.c @@ -201,13 +201,13 @@ bpf_tcp_ca_get_func_proto(enum bpf_func_id func_id, } } -BTF_SET8_START(bpf_tcp_ca_check_kfunc_ids) +BTF_KFUNCS_START(bpf_tcp_ca_check_kfunc_ids) BTF_ID_FLAGS(func, tcp_reno_ssthresh) BTF_ID_FLAGS(func, tcp_reno_cong_avoid) BTF_ID_FLAGS(func, tcp_reno_undo_cwnd) BTF_ID_FLAGS(func, tcp_slow_start) BTF_ID_FLAGS(func, tcp_cong_avoid_ai) -BTF_SET8_END(bpf_tcp_ca_check_kfunc_ids) +BTF_KFUNCS_END(bpf_tcp_ca_check_kfunc_ids) static const struct btf_kfunc_id_set bpf_tcp_ca_kfunc_set = { .owner = THIS_MODULE, diff --git a/net/ipv4/fou_bpf.c b/net/ipv4/fou_bpf.c index 4da03bf45c9b..06e5572f296f 100644 --- a/net/ipv4/fou_bpf.c +++ b/net/ipv4/fou_bpf.c @@ -100,10 +100,10 @@ __bpf_kfunc int bpf_skb_get_fou_encap(struct __sk_buff *skb_ctx, __bpf_kfunc_end_defs(); -BTF_SET8_START(fou_kfunc_set) +BTF_KFUNCS_START(fou_kfunc_set) BTF_ID_FLAGS(func, bpf_skb_set_fou_encap) BTF_ID_FLAGS(func, bpf_skb_get_fou_encap) -BTF_SET8_END(fou_kfunc_set) +BTF_KFUNCS_END(fou_kfunc_set) static const struct btf_kfunc_id_set fou_bpf_kfunc_set = { .owner = THIS_MODULE, diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c index 146792cd26fe..56bb7a9621ab 100644 --- a/net/ipv4/tcp_bbr.c +++ b/net/ipv4/tcp_bbr.c @@ -1154,7 +1154,7 @@ static struct tcp_congestion_ops tcp_bbr_cong_ops __read_mostly = { .set_state = bbr_set_state, }; -BTF_SET8_START(tcp_bbr_check_kfunc_ids) +BTF_KFUNCS_START(tcp_bbr_check_kfunc_ids) #ifdef CONFIG_X86 #ifdef CONFIG_DYNAMIC_FTRACE BTF_ID_FLAGS(func, bbr_init) @@ -1167,7 +1167,7 @@ BTF_ID_FLAGS(func, bbr_min_tso_segs) BTF_ID_FLAGS(func, bbr_set_state) #endif #endif -BTF_SET8_END(tcp_bbr_check_kfunc_ids) +BTF_KFUNCS_END(tcp_bbr_check_kfunc_ids) static const struct btf_kfunc_id_set tcp_bbr_kfunc_set = { .owner = THIS_MODULE, diff --git a/net/ipv4/tcp_cubic.c b/net/ipv4/tcp_cubic.c index 5ff7be13deb6..103e93f7a9f1 100644 --- a/net/ipv4/tcp_cubic.c +++ b/net/ipv4/tcp_cubic.c @@ -487,7 +487,7 @@ static struct tcp_congestion_ops cubictcp __read_mostly = { .name = "cubic", }; -BTF_SET8_START(tcp_cubic_check_kfunc_ids) +BTF_KFUNCS_START(tcp_cubic_check_kfunc_ids) #ifdef CONFIG_X86 #ifdef CONFIG_DYNAMIC_FTRACE BTF_ID_FLAGS(func, cubictcp_init) @@ -498,7 +498,7 @@ BTF_ID_FLAGS(func, cubictcp_cwnd_event) BTF_ID_FLAGS(func, cubictcp_acked) #endif #endif -BTF_SET8_END(tcp_cubic_check_kfunc_ids) +BTF_KFUNCS_END(tcp_cubic_check_kfunc_ids) static const struct btf_kfunc_id_set tcp_cubic_kfunc_set = { .owner = THIS_MODULE, diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c index 8ad62713b0ba..b004280855f8 100644 --- a/net/ipv4/tcp_dctcp.c +++ b/net/ipv4/tcp_dctcp.c @@ -271,7 +271,7 @@ static struct tcp_congestion_ops dctcp_reno __read_mostly = { .name = "dctcp-reno", }; -BTF_SET8_START(tcp_dctcp_check_kfunc_ids) +BTF_KFUNCS_START(tcp_dctcp_check_kfunc_ids) #ifdef CONFIG_X86 #ifdef CONFIG_DYNAMIC_FTRACE BTF_ID_FLAGS(func, dctcp_init) @@ -282,7 +282,7 @@ BTF_ID_FLAGS(func, dctcp_cwnd_undo) BTF_ID_FLAGS(func, dctcp_state) #endif #endif -BTF_SET8_END(tcp_dctcp_check_kfunc_ids) +BTF_KFUNCS_END(tcp_dctcp_check_kfunc_ids) static const struct btf_kfunc_id_set tcp_dctcp_kfunc_set = { .owner = THIS_MODULE, diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c index 475358ec8212..d2492d050fe6 100644 --- a/net/netfilter/nf_conntrack_bpf.c +++ b/net/netfilter/nf_conntrack_bpf.c @@ -467,7 +467,7 @@ __bpf_kfunc int bpf_ct_change_status(struct nf_conn *nfct, u32 status) __bpf_kfunc_end_defs(); -BTF_SET8_START(nf_ct_kfunc_set) +BTF_KFUNCS_START(nf_ct_kfunc_set) BTF_ID_FLAGS(func, bpf_xdp_ct_alloc, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_xdp_ct_lookup, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_skb_ct_alloc, KF_ACQUIRE | KF_RET_NULL) @@ -478,7 +478,7 @@ BTF_ID_FLAGS(func, bpf_ct_set_timeout, KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_ct_change_timeout, KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_ct_set_status, KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_ct_change_status, KF_TRUSTED_ARGS) -BTF_SET8_END(nf_ct_kfunc_set) +BTF_KFUNCS_END(nf_ct_kfunc_set) static const struct btf_kfunc_id_set nf_conntrack_kfunc_set = { .owner = THIS_MODULE, diff --git a/net/netfilter/nf_nat_bpf.c b/net/netfilter/nf_nat_bpf.c index 6e3b2f58855f..481be15609b1 100644 --- a/net/netfilter/nf_nat_bpf.c +++ b/net/netfilter/nf_nat_bpf.c @@ -54,9 +54,9 @@ __bpf_kfunc int bpf_ct_set_nat_info(struct nf_conn___init *nfct, __bpf_kfunc_end_defs(); -BTF_SET8_START(nf_nat_kfunc_set) +BTF_KFUNCS_START(nf_nat_kfunc_set) BTF_ID_FLAGS(func, bpf_ct_set_nat_info, KF_TRUSTED_ARGS) -BTF_SET8_END(nf_nat_kfunc_set) +BTF_KFUNCS_END(nf_nat_kfunc_set) static const struct btf_kfunc_id_set nf_bpf_nat_kfunc_set = { .owner = THIS_MODULE, diff --git a/net/xfrm/xfrm_interface_bpf.c b/net/xfrm/xfrm_interface_bpf.c index 7d5e920141e9..5ea15037ebd1 100644 --- a/net/xfrm/xfrm_interface_bpf.c +++ b/net/xfrm/xfrm_interface_bpf.c @@ -93,10 +93,10 @@ __bpf_kfunc int bpf_skb_set_xfrm_info(struct __sk_buff *skb_ctx, const struct bp __bpf_kfunc_end_defs(); -BTF_SET8_START(xfrm_ifc_kfunc_set) +BTF_KFUNCS_START(xfrm_ifc_kfunc_set) BTF_ID_FLAGS(func, bpf_skb_get_xfrm_info) BTF_ID_FLAGS(func, bpf_skb_set_xfrm_info) -BTF_SET8_END(xfrm_ifc_kfunc_set) +BTF_KFUNCS_END(xfrm_ifc_kfunc_set) static const struct btf_kfunc_id_set xfrm_interface_kfunc_set = { .owner = THIS_MODULE, diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index ca863f436978..b4133a1b7a72 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -339,11 +339,11 @@ static struct bin_attribute bin_attr_bpf_testmod_file __ro_after_init = { .write = bpf_testmod_test_write, }; -BTF_SET8_START(bpf_testmod_common_kfunc_ids) +BTF_KFUNCS_START(bpf_testmod_common_kfunc_ids) BTF_ID_FLAGS(func, bpf_iter_testmod_seq_new, KF_ITER_NEW) BTF_ID_FLAGS(func, bpf_iter_testmod_seq_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_testmod_seq_destroy, KF_ITER_DESTROY) -BTF_SET8_END(bpf_testmod_common_kfunc_ids) +BTF_KFUNCS_END(bpf_testmod_common_kfunc_ids) static const struct btf_kfunc_id_set bpf_testmod_common_kfunc_set = { .owner = THIS_MODULE, @@ -489,7 +489,7 @@ __bpf_kfunc static u32 bpf_kfunc_call_test_static_unused_arg(u32 arg, u32 unused return arg; } -BTF_SET8_START(bpf_testmod_check_kfunc_ids) +BTF_KFUNCS_START(bpf_testmod_check_kfunc_ids) BTF_ID_FLAGS(func, bpf_testmod_test_mod_kfunc) BTF_ID_FLAGS(func, bpf_kfunc_call_test1) BTF_ID_FLAGS(func, bpf_kfunc_call_test2) @@ -515,7 +515,7 @@ BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS | KF_RCU) BTF_ID_FLAGS(func, bpf_kfunc_call_test_destructive, KF_DESTRUCTIVE) BTF_ID_FLAGS(func, bpf_kfunc_call_test_static_unused_arg) BTF_ID_FLAGS(func, bpf_kfunc_call_test_offset) -BTF_SET8_END(bpf_testmod_check_kfunc_ids) +BTF_KFUNCS_END(bpf_testmod_check_kfunc_ids) static int bpf_testmod_ops_init(struct btf *btf) { -- 2.34.1
From: Andrii Nakryiko <andrii@kernel.org> mainline inclusion from mainline-v6.9-rc1 commit c81a8ab196b5083d5109a51585fcc24fa2055a77 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Seems like original commit adding split BTF support intended to add btf__new_split() API, and even declared it in libbpf.map, but never added (trivial) implementation. Fix this. Fixes: ba451366bf44 ("libbpf: Implement basic split BTF support") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240201172027.604869-4-andrii@kernel.org Conflicts: tools/lib/bpf/libbpf.map [The conflicts were due to some minor issue] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/btf.c | 5 +++++ tools/lib/bpf/libbpf.map | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index 2e9f28cece3f..2fe479fa0c77 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -919,6 +919,11 @@ struct btf *btf__new(const void *data, __u32 size) return libbpf_ptr(btf_new(data, size, NULL)); } +struct btf *btf__new_split(const void *data, __u32 size, struct btf *base_btf) +{ + return libbpf_ptr(btf_new(data, size, base_btf)); +} + static struct btf *btf_parse_elf(const char *path, struct btf *base_btf, struct btf_ext **btf_ext) { diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map index 228ab00a5e69..f802a77d10b6 100644 --- a/tools/lib/bpf/libbpf.map +++ b/tools/lib/bpf/libbpf.map @@ -246,7 +246,6 @@ LIBBPF_0.3.0 { btf__parse_raw_split; btf__parse_split; btf__new_empty_split; - btf__new_split; ring_buffer__epoll_fd; } LIBBPF_0.2.0; @@ -395,6 +394,7 @@ LIBBPF_1.2.0 { LIBBPF_1.3.0 { global: + btf__new_split; bpf_obj_pin_opts; bpf_object__unpin; bpf_prog_detach_opts; -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit df9705eaa0bad034dad0f73386ff82f5c4dd7e24 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- The "i" here is always equal to "btf_type_vlen(t)" since the "for_each_member()" loop never breaks. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240203055119.2235598-1-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/bpf_struct_ops.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 0decd862dfe0..f98f580de77a 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -189,20 +189,17 @@ int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, } } - if (i == btf_type_vlen(t)) { - if (st_ops->init(btf)) { - pr_warn("Error in init bpf_struct_ops %s\n", - st_ops->name); - return -EINVAL; - } else { - st_ops_desc->type_id = type_id; - st_ops_desc->type = t; - st_ops_desc->value_id = value_id; - st_ops_desc->value_type = btf_type_by_id(btf, - value_id); - } + if (st_ops->init(btf)) { + pr_warn("Error in init bpf_struct_ops %s\n", + st_ops->name); + return -EINVAL; } + st_ops_desc->type_id = type_id; + st_ops_desc->type = t; + st_ops_desc->value_id = value_id; + st_ops_desc->value_type = btf_type_by_id(btf, value_id); + return 0; } -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 169e650069647325496e5a65408ff301874c8e01 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- "r" is used to receive the return value of test_2 in bpf_testmod.c, but it is not actually used. So, we remove "r" and change the return type to "void". Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202401300557.z5vzn8FM-lkp@intel.com/ Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240204061204.1864529-1-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c | 6 ++---- tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h | 2 +- tools/testing/selftests/bpf/progs/struct_ops_module.c | 3 +-- 3 files changed, 4 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index b4133a1b7a72..2e1b5d8f0fa2 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -549,9 +549,8 @@ static const struct bpf_verifier_ops bpf_testmod_verifier_ops = { static int bpf_dummy_reg(void *kdata) { struct bpf_testmod_ops *ops = kdata; - int r; - r = ops->test_2(4, 3); + ops->test_2(4, 3); return 0; } @@ -565,9 +564,8 @@ static int bpf_testmod_test_1(void) return 0; } -static int bpf_testmod_test_2(int a, int b) +static void bpf_testmod_test_2(int a, int b) { - return 0; } static struct bpf_testmod_ops __bpf_testmod_ops = { diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h index ca5435751c79..537beca42896 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h @@ -30,7 +30,7 @@ struct bpf_iter_testmod_seq { struct bpf_testmod_ops { int (*test_1)(void); - int (*test_2)(int a, int b); + void (*test_2)(int a, int b); }; #endif /* _BPF_TESTMOD_H */ diff --git a/tools/testing/selftests/bpf/progs/struct_ops_module.c b/tools/testing/selftests/bpf/progs/struct_ops_module.c index e44ac55195ca..b78746b3cef3 100644 --- a/tools/testing/selftests/bpf/progs/struct_ops_module.c +++ b/tools/testing/selftests/bpf/progs/struct_ops_module.c @@ -16,10 +16,9 @@ int BPF_PROG(test_1) } SEC("struct_ops/test_2") -int BPF_PROG(test_2, int a, int b) +void BPF_PROG(test_2, int a, int b) { test_2_result = a + b; - return a + b; } SEC(".struct_ops.link") -- 2.34.1
From: Kumar Kartikeya Dwivedi <memxor@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit a44b1334aadd82203f661adb9adb41e53ad0e8d1 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Currently, calling any helpers, kfuncs, or subprogs except the graph data structure (lists, rbtrees) API kfuncs while holding a bpf_spin_lock is not allowed. One of the original motivations of this decision was to force the BPF programmer's hand into keeping the bpf_spin_lock critical section small, and to ensure the execution time of the program does not increase due to lock waiting times. In addition to this, some of the helpers and kfuncs may be unsafe to call while holding a bpf_spin_lock. However, when it comes to subprog calls, atleast for static subprogs, the verifier is able to explore their instructions during verification. Therefore, it is similar in effect to having the same code inlined into the critical section. Hence, not allowing static subprog calls in the bpf_spin_lock critical section is mostly an annoyance that needs to be worked around, without providing any tangible benefit. Unlike static subprog calls, global subprog calls are not safe to permit within the critical section, as the verifier does not explore them during verification, therefore whether the same lock will be taken again, or unlocked, cannot be ascertained. Therefore, allow calling static subprogs within a bpf_spin_lock critical section, and only reject it in case the subprog linkage is global. Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: David Vernet <void@manifault.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20240204222349.938118-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Conflicts: kernel/bpf/verifier.c [Fix context conflicts because of missing BPF exceptions.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/verifier.c | 11 ++++++++--- .../testing/selftests/bpf/progs/verifier_spin_lock.c | 2 +- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 32ba0b623ef2..827841400e63 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -9490,6 +9490,13 @@ static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn, if (subprog_is_global(env, subprog)) { const char *sub_name = subprog_name(env, subprog); + /* Only global subprogs cannot be called with a lock held. */ + if (env->cur_state->active_lock.ptr) { + verbose(env, "global function calls are not allowed while holding a lock,\n" + "use static function instead\n"); + return -EINVAL; + } + if (err) { verbose(env, "Caller passes invalid args into func#%d ('%s')\n", subprog, sub_name); @@ -17528,7 +17535,6 @@ static int do_check(struct bpf_verifier_env *env) if (env->cur_state->active_lock.ptr) { if ((insn->src_reg == BPF_REG_0 && insn->imm != BPF_FUNC_spin_unlock) || - (insn->src_reg == BPF_PSEUDO_CALL) || (insn->src_reg == BPF_PSEUDO_KFUNC_CALL && (insn->off != 0 || !is_bpf_graph_api_kfunc(insn->imm)))) { verbose(env, "function calls are not allowed while holding a lock\n"); @@ -17571,8 +17577,7 @@ static int do_check(struct bpf_verifier_env *env) return -EINVAL; } - if (env->cur_state->active_lock.ptr && - !in_rbtree_lock_required_cb(env)) { + if (env->cur_state->active_lock.ptr && !env->cur_state->curframe) { verbose(env, "bpf_spin_unlock is missing\n"); return -EINVAL; } diff --git a/tools/testing/selftests/bpf/progs/verifier_spin_lock.c b/tools/testing/selftests/bpf/progs/verifier_spin_lock.c index 9c1aa69650f8..fb316c080c84 100644 --- a/tools/testing/selftests/bpf/progs/verifier_spin_lock.c +++ b/tools/testing/selftests/bpf/progs/verifier_spin_lock.c @@ -330,7 +330,7 @@ l1_%=: r7 = r0; \ SEC("cgroup/skb") __description("spin_lock: test10 lock in subprog without unlock") -__failure __msg("unlock is missing") +__success __failure_unpriv __msg_unpriv("") __naked void lock_in_subprog_without_unlock(void) { -- 2.34.1
From: Kumar Kartikeya Dwivedi <memxor@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit e8699c4ff85baedcf40f33db816cc487cee39397 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Add selftests for static subprog calls within bpf_spin_lock critical section, and ensure we still reject global subprog calls. Also test the case where a subprog call will unlock the caller's held lock, or the caller will unlock a lock taken by a subprog call, ensuring correct transfer of lock state across frames on exit. Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: David Vernet <void@manifault.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20240204222349.938118-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/prog_tests/spin_lock.c | 2 + .../selftests/bpf/progs/test_spin_lock.c | 65 +++++++++++++++++++ .../selftests/bpf/progs/test_spin_lock_fail.c | 44 +++++++++++++ 3 files changed, 111 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/spin_lock.c b/tools/testing/selftests/bpf/prog_tests/spin_lock.c index f29c08d93beb..11814d6b81e4 100644 --- a/tools/testing/selftests/bpf/prog_tests/spin_lock.c +++ b/tools/testing/selftests/bpf/prog_tests/spin_lock.c @@ -48,6 +48,8 @@ static struct { { "lock_id_mismatch_innermapval_kptr", "bpf_spin_unlock of different lock" }, { "lock_id_mismatch_innermapval_global", "bpf_spin_unlock of different lock" }, { "lock_id_mismatch_innermapval_mapval", "bpf_spin_unlock of different lock" }, + { "lock_global_subprog_call1", "global function calls are not allowed while holding a lock" }, + { "lock_global_subprog_call2", "global function calls are not allowed while holding a lock" }, }; static int match_regex(const char *pattern, const char *string) diff --git a/tools/testing/selftests/bpf/progs/test_spin_lock.c b/tools/testing/selftests/bpf/progs/test_spin_lock.c index b2440a0ff422..d8d77bdffd3d 100644 --- a/tools/testing/selftests/bpf/progs/test_spin_lock.c +++ b/tools/testing/selftests/bpf/progs/test_spin_lock.c @@ -101,4 +101,69 @@ int bpf_spin_lock_test(struct __sk_buff *skb) err: return err; } + +struct bpf_spin_lock lockA __hidden SEC(".data.A"); + +__noinline +static int static_subprog(struct __sk_buff *ctx) +{ + volatile int ret = 0; + + if (ctx->protocol) + return ret; + return ret + ctx->len; +} + +__noinline +static int static_subprog_lock(struct __sk_buff *ctx) +{ + volatile int ret = 0; + + ret = static_subprog(ctx); + bpf_spin_lock(&lockA); + return ret + ctx->len; +} + +__noinline +static int static_subprog_unlock(struct __sk_buff *ctx) +{ + volatile int ret = 0; + + ret = static_subprog(ctx); + bpf_spin_unlock(&lockA); + return ret + ctx->len; +} + +SEC("tc") +int lock_static_subprog_call(struct __sk_buff *ctx) +{ + int ret = 0; + + bpf_spin_lock(&lockA); + if (ctx->mark == 42) + ret = static_subprog(ctx); + bpf_spin_unlock(&lockA); + return ret; +} + +SEC("tc") +int lock_static_subprog_lock(struct __sk_buff *ctx) +{ + int ret = 0; + + ret = static_subprog_lock(ctx); + bpf_spin_unlock(&lockA); + return ret; +} + +SEC("tc") +int lock_static_subprog_unlock(struct __sk_buff *ctx) +{ + int ret = 0; + + bpf_spin_lock(&lockA); + ret = static_subprog_unlock(ctx); + return ret; +} + char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/test_spin_lock_fail.c b/tools/testing/selftests/bpf/progs/test_spin_lock_fail.c index 293ac1049d38..1c8b678e2e9a 100644 --- a/tools/testing/selftests/bpf/progs/test_spin_lock_fail.c +++ b/tools/testing/selftests/bpf/progs/test_spin_lock_fail.c @@ -201,4 +201,48 @@ CHECK(innermapval_mapval, &iv->lock, &v->lock); #undef CHECK +__noinline +int global_subprog(struct __sk_buff *ctx) +{ + volatile int ret = 0; + + if (ctx->protocol) + ret += ctx->protocol; + return ret + ctx->mark; +} + +__noinline +static int static_subprog_call_global(struct __sk_buff *ctx) +{ + volatile int ret = 0; + + if (ctx->protocol) + return ret; + return ret + ctx->len + global_subprog(ctx); +} + +SEC("?tc") +int lock_global_subprog_call1(struct __sk_buff *ctx) +{ + int ret = 0; + + bpf_spin_lock(&lockA); + if (ctx->mark == 42) + ret = global_subprog(ctx); + bpf_spin_unlock(&lockA); + return ret; +} + +SEC("?tc") +int lock_global_subprog_call2(struct __sk_buff *ctx) +{ + int ret = 0; + + bpf_spin_lock(&lockA); + if (ctx->mark == 42) + ret = static_subprog_call_global(ctx); + bpf_spin_unlock(&lockA); + return ret; +} + char _license[] SEC("license") = "GPL"; -- 2.34.1
From: Kumar Kartikeya Dwivedi <memxor@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 6fceea0fa59f6786a2847a4cae409117624e8b58 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Allow transferring an imbalanced RCU lock state between subprog calls during verification. This allows patterns where a subprog call returns with an RCU lock held, or a subprog call releases an RCU lock held by the caller. Currently, the verifier would end up complaining if the RCU lock is not released when processing an exit from a subprog, which is non-ideal if its execution is supposed to be enclosed in an RCU read section of the caller. Instead, simply only check whether we are processing exit for frame#0 and do not complain on an active RCU lock otherwise. We only need to update the check when processing BPF_EXIT insn, as copy_verifier_state is already set up to do the right thing. Suggested-by: David Vernet <void@manifault.com> Tested-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20240205055646.1112186-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/verifier.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 827841400e63..0aae4ff45aee 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -17582,8 +17582,7 @@ static int do_check(struct bpf_verifier_env *env) return -EINVAL; } - if (env->cur_state->active_rcu_lock && - !in_rbtree_lock_required_cb(env)) { + if (env->cur_state->active_rcu_lock && !env->cur_state->curframe) { verbose(env, "bpf_rcu_read_unlock is missing\n"); return -EINVAL; } -- 2.34.1
From: Kumar Kartikeya Dwivedi <memxor@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 8be6a0147af314fd60db9da2158cd737dc6394a7 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Add selftests covering the following cases: - A static or global subprog called from within a RCU read section works - A static subprog taking an RCU read lock which is released in caller works - A static subprog releasing the caller's RCU read lock works Global subprogs that leave the lock in an imbalanced state will not work, as they are verified separately, so ensure those cases fail as well. Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20240205055646.1112186-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/prog_tests/rcu_read_lock.c | 6 + .../selftests/bpf/progs/rcu_read_lock.c | 120 ++++++++++++++++++ 2 files changed, 126 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c b/tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c index 3f1f58d3a729..a1f7e7378a64 100644 --- a/tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c +++ b/tools/testing/selftests/bpf/prog_tests/rcu_read_lock.c @@ -29,6 +29,10 @@ static void test_success(void) bpf_program__set_autoload(skel->progs.non_sleepable_1, true); bpf_program__set_autoload(skel->progs.non_sleepable_2, true); bpf_program__set_autoload(skel->progs.task_trusted_non_rcuptr, true); + bpf_program__set_autoload(skel->progs.rcu_read_lock_subprog, true); + bpf_program__set_autoload(skel->progs.rcu_read_lock_global_subprog, true); + bpf_program__set_autoload(skel->progs.rcu_read_lock_subprog_lock, true); + bpf_program__set_autoload(skel->progs.rcu_read_lock_subprog_unlock, true); err = rcu_read_lock__load(skel); if (!ASSERT_OK(err, "skel_load")) goto out; @@ -75,6 +79,8 @@ static const char * const inproper_region_tests[] = { "inproper_sleepable_helper", "inproper_sleepable_kfunc", "nested_rcu_region", + "rcu_read_lock_global_subprog_lock", + "rcu_read_lock_global_subprog_unlock", }; static void test_inproper_region(void) diff --git a/tools/testing/selftests/bpf/progs/rcu_read_lock.c b/tools/testing/selftests/bpf/progs/rcu_read_lock.c index 14fb01437fb8..ab3a532b7dd6 100644 --- a/tools/testing/selftests/bpf/progs/rcu_read_lock.c +++ b/tools/testing/selftests/bpf/progs/rcu_read_lock.c @@ -319,3 +319,123 @@ int cross_rcu_region(void *ctx) bpf_rcu_read_unlock(); return 0; } + +__noinline +static int static_subprog(void *ctx) +{ + volatile int ret = 0; + + if (bpf_get_prandom_u32()) + return ret + 42; + return ret + bpf_get_prandom_u32(); +} + +__noinline +int global_subprog(u64 a) +{ + volatile int ret = a; + + return ret + static_subprog(NULL); +} + +__noinline +static int static_subprog_lock(void *ctx) +{ + volatile int ret = 0; + + bpf_rcu_read_lock(); + if (bpf_get_prandom_u32()) + return ret + 42; + return ret + bpf_get_prandom_u32(); +} + +__noinline +int global_subprog_lock(u64 a) +{ + volatile int ret = a; + + return ret + static_subprog_lock(NULL); +} + +__noinline +static int static_subprog_unlock(void *ctx) +{ + volatile int ret = 0; + + bpf_rcu_read_unlock(); + if (bpf_get_prandom_u32()) + return ret + 42; + return ret + bpf_get_prandom_u32(); +} + +__noinline +int global_subprog_unlock(u64 a) +{ + volatile int ret = a; + + return ret + static_subprog_unlock(NULL); +} + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +int rcu_read_lock_subprog(void *ctx) +{ + volatile int ret = 0; + + bpf_rcu_read_lock(); + if (bpf_get_prandom_u32()) + ret += static_subprog(ctx); + bpf_rcu_read_unlock(); + return 0; +} + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +int rcu_read_lock_global_subprog(void *ctx) +{ + volatile int ret = 0; + + bpf_rcu_read_lock(); + if (bpf_get_prandom_u32()) + ret += global_subprog(ret); + bpf_rcu_read_unlock(); + return 0; +} + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +int rcu_read_lock_subprog_lock(void *ctx) +{ + volatile int ret = 0; + + ret += static_subprog_lock(ctx); + bpf_rcu_read_unlock(); + return 0; +} + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +int rcu_read_lock_global_subprog_lock(void *ctx) +{ + volatile int ret = 0; + + ret += global_subprog_lock(ret); + bpf_rcu_read_unlock(); + return 0; +} + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +int rcu_read_lock_subprog_unlock(void *ctx) +{ + volatile int ret = 0; + + bpf_rcu_read_lock(); + ret += static_subprog_unlock(ctx); + return 0; +} + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +int rcu_read_lock_global_subprog_unlock(void *ctx) +{ + volatile int ret = 0; + + bpf_rcu_read_lock(); + ret += global_subprog_unlock(ret); + return 0; +} -- 2.34.1
From: Geliang Tang <tanggeliang@kylinos.cn> mainline inclusion from mainline-v6.9-rc1 commit 947e56f82fd783a1ec1c9359b20b5699d09cae14 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Similar to the handling in the functions __register_btf_kfunc_id_set() and register_btf_id_dtor_kfuncs(), this patch uses the newly added helper check_btf_kconfigs() to handle module with its btf section stripped. While at it, the patch also adds the missed IS_ERR() check to fix the commit f6be98d19985 ("bpf, net: switch to dynamic registration") Fixes: f6be98d19985 ("bpf, net: switch to dynamic registration") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/69082b9835463fe36f9e354bddf2d0a97df39c2b.170737330... Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/btf.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 2f8cfd27bd9e..ddde130786f7 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -9000,7 +9000,9 @@ int __register_bpf_struct_ops(struct bpf_struct_ops *st_ops) btf = btf_get_module_btf(st_ops->owner); if (!btf) - return -EINVAL; + return check_btf_kconfigs(st_ops->owner, "struct_ops"); + if (IS_ERR(btf)) + return PTR_ERR(btf); log = kzalloc(sizeof(*log), GFP_KERNEL | __GFP_NOWARN); if (!log) { -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 77c0208e199ccb0986fb3612f2409c8cdcb036ad category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Enable the providers to use types defined in a module instead of in the kernel (btf_vmlinux). Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240209023750.1153905-2-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf.h | 1 + kernel/bpf/btf.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 0624b33be7a2..986fef28f5b4 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1488,6 +1488,7 @@ struct bpf_jit_poke_descriptor { struct bpf_ctx_arg_aux { u32 offset; enum bpf_reg_type reg_type; + struct btf *btf; u32 btf_id; }; diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index ddde130786f7..6fad9bb2281e 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -6302,7 +6302,7 @@ bool btf_ctx_access(int off, int size, enum bpf_access_type type, } info->reg_type = ctx_arg_info->reg_type; - info->btf = btf_vmlinux; + info->btf = ctx_arg_info->btf ? : btf_vmlinux; info->btf_id = ctx_arg_info->btf_id; return true; } -- 2.34.1
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA -------------------------------- Fix kabi-breakage by using KABI_BROKEN_INSERT. Fixes: 0d52d64ea93d ("bpf: add btf pointer to struct bpf_ctx_arg_aux.") Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 986fef28f5b4..5313a8a42420 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1488,8 +1488,8 @@ struct bpf_jit_poke_descriptor { struct bpf_ctx_arg_aux { u32 offset; enum bpf_reg_type reg_type; - struct btf *btf; u32 btf_id; + KABI_BROKEN_INSERT(struct btf *btf) }; struct btf_mod_pair { -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 6115a0aeef01aef152ad7738393aad11422bfb82 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Move __kfunc_param_match_suffix() to btf.c and rename it as btf_param_match_suffix(). It can be reused by bpf_struct_ops later. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240209023750.1153905-3-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Conflicts: kernel/bpf/verifier.c [Fix conflicts because of missing some funcs.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/btf.h | 4 ++++ kernel/bpf/btf.c | 18 ++++++++++++++++++ kernel/bpf/verifier.c | 36 +++++++++--------------------------- 3 files changed, 31 insertions(+), 27 deletions(-) diff --git a/include/linux/btf.h b/include/linux/btf.h index 79f34103ab5d..c9e3e0786648 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -495,6 +495,10 @@ static inline void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id) return bsearch(&id, set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func); } +bool btf_param_match_suffix(const struct btf *btf, + const struct btf_param *arg, + const char *suffix); + struct bpf_verifier_log; #if defined(CONFIG_BPF_JIT) && defined(CONFIG_BPF_SYSCALL) diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 6fad9bb2281e..aa02f1dc9029 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -9022,3 +9022,21 @@ int __register_bpf_struct_ops(struct bpf_struct_ops *st_ops) } EXPORT_SYMBOL_GPL(__register_bpf_struct_ops); #endif + +bool btf_param_match_suffix(const struct btf *btf, + const struct btf_param *arg, + const char *suffix) +{ + int suffix_len = strlen(suffix), len; + const char *param_name; + + /* In the future, this can be ported to use BTF tagging */ + param_name = btf_name_by_offset(btf, arg->name_off); + if (str_is_empty(param_name)) + return false; + len = strlen(param_name); + if (len <= suffix_len) + return false; + param_name += len - suffix_len; + return !strncmp(param_name, suffix, suffix_len); +} diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 0aae4ff45aee..bb3f3ff80d15 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -10634,24 +10634,6 @@ static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta) return meta->kfunc_flags & KF_RCU_PROTECTED; } -static bool __kfunc_param_match_suffix(const struct btf *btf, - const struct btf_param *arg, - const char *suffix) -{ - int suffix_len = strlen(suffix), len; - const char *param_name; - - /* In the future, this can be ported to use BTF tagging */ - param_name = btf_name_by_offset(btf, arg->name_off); - if (str_is_empty(param_name)) - return false; - len = strlen(param_name); - if (len < suffix_len) - return false; - param_name += len - suffix_len; - return !strncmp(param_name, suffix, suffix_len); -} - static bool is_kfunc_arg_mem_size(const struct btf *btf, const struct btf_param *arg, const struct bpf_reg_state *reg) @@ -10662,7 +10644,7 @@ static bool is_kfunc_arg_mem_size(const struct btf *btf, if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE) return false; - return __kfunc_param_match_suffix(btf, arg, "__sz"); + return btf_param_match_suffix(btf, arg, "__sz"); } static bool is_kfunc_arg_const_mem_size(const struct btf *btf, @@ -10675,42 +10657,42 @@ static bool is_kfunc_arg_const_mem_size(const struct btf *btf, if (!btf_type_is_scalar(t) || reg->type != SCALAR_VALUE) return false; - return __kfunc_param_match_suffix(btf, arg, "__szk"); + return btf_param_match_suffix(btf, arg, "__szk"); } static bool is_kfunc_arg_optional(const struct btf *btf, const struct btf_param *arg) { - return __kfunc_param_match_suffix(btf, arg, "__opt"); + return btf_param_match_suffix(btf, arg, "__opt"); } static bool is_kfunc_arg_constant(const struct btf *btf, const struct btf_param *arg) { - return __kfunc_param_match_suffix(btf, arg, "__k"); + return btf_param_match_suffix(btf, arg, "__k"); } static bool is_kfunc_arg_ignore(const struct btf *btf, const struct btf_param *arg) { - return __kfunc_param_match_suffix(btf, arg, "__ign"); + return btf_param_match_suffix(btf, arg, "__ign"); } static bool is_kfunc_arg_alloc_obj(const struct btf *btf, const struct btf_param *arg) { - return __kfunc_param_match_suffix(btf, arg, "__alloc"); + return btf_param_match_suffix(btf, arg, "__alloc"); } static bool is_kfunc_arg_uninit(const struct btf *btf, const struct btf_param *arg) { - return __kfunc_param_match_suffix(btf, arg, "__uninit"); + return btf_param_match_suffix(btf, arg, "__uninit"); } static bool is_kfunc_arg_refcounted_kptr(const struct btf *btf, const struct btf_param *arg) { - return __kfunc_param_match_suffix(btf, arg, "__refcounted_kptr"); + return btf_param_match_suffix(btf, arg, "__refcounted_kptr"); } static bool is_kfunc_arg_nullable(const struct btf *btf, const struct btf_param *arg) { - return __kfunc_param_match_suffix(btf, arg, "__nullable"); + return btf_param_match_suffix(btf, arg, "__nullable"); } static bool is_kfunc_arg_scalar_with_name(const struct btf *btf, -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 1611603537a4b88cec7993f32b70c03113801a46 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Collect argument information from the type information of stub functions to mark arguments of BPF struct_ops programs with PTR_MAYBE_NULL if they are nullable. A nullable argument is annotated by suffixing "__nullable" at the argument name of stub function. For nullable arguments, this patch sets a struct bpf_ctx_arg_aux to label their reg_type with PTR_TO_BTF_ID | PTR_TRUSTED | PTR_MAYBE_NULL. This makes the verifier to check programs and ensure that they properly check the pointer. The programs should check if the pointer is null before accessing the pointed memory. The implementer of a struct_ops type should annotate the arguments that can be null. The implementer should define a stub function (empty) as a placeholder for each defined operator. The name of a stub function should be in the pattern "<st_op_type>__<operator name>". For example, for test_maybe_null of struct bpf_testmod_ops, it's stub function name should be "bpf_testmod_ops__test_maybe_null". You mark an argument nullable by suffixing the argument name with "__nullable" at the stub function. Since we already has stub functions for kCFI, we just reuse these stub functions with the naming convention mentioned earlier. These stub functions with the naming convention is only required if there are nullable arguments to annotate. For functions having not nullable arguments, stub functions are not necessary for the purpose of this patch. This patch will prepare a list of struct bpf_ctx_arg_aux, aka arg_info, for each member field of a struct_ops type. "arg_info" will be assigned to "prog->aux->ctx_arg_info" of BPF struct_ops programs in check_struct_ops_btf_id() so that it can be used by btf_ctx_access() later to set reg_type properly for the verifier. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240209023750.1153905-4-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Conflicts: include/linux/bpf.h kernel/bpf/btf.c [Fix conflicts because of kabi and context diff.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf.h | 21 ++++ include/linux/btf.h | 2 + kernel/bpf/bpf_struct_ops.c | 213 ++++++++++++++++++++++++++++++++++-- kernel/bpf/btf.c | 27 +++++ kernel/bpf/verifier.c | 6 + 5 files changed, 257 insertions(+), 12 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 5313a8a42420..6bb262b615ff 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1806,6 +1806,19 @@ struct bpf_struct_ops { KABI_RESERVE(4) }; +/* Every member of a struct_ops type has an instance even a member is not + * an operator (function pointer). The "info" field will be assigned to + * prog->aux->ctx_arg_info of BPF struct_ops programs to provide the + * argument information required by the verifier to verify the program. + * + * btf_ctx_access() will lookup prog->aux->ctx_arg_info to find the + * corresponding entry for an given argument. + */ +struct bpf_struct_ops_arg_info { + struct bpf_ctx_arg_aux *info; + u32 cnt; +}; + struct bpf_struct_ops_desc { struct bpf_struct_ops *st_ops; @@ -1813,6 +1826,9 @@ struct bpf_struct_ops_desc { const struct btf_type *value_type; u32 type_id; u32 value_id; + + /* Collection of argument information for each member */ + struct bpf_struct_ops_arg_info *arg_info; }; enum bpf_struct_ops_state { @@ -1887,6 +1903,7 @@ int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, struct btf *btf, struct bpf_verifier_log *log); void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map); +void bpf_struct_ops_desc_release(struct bpf_struct_ops_desc *st_ops_desc); #else #define register_bpf_struct_ops(st_ops, type) ({ (void *)(st_ops); 0; }) static inline bool bpf_try_module_get(const void *data, struct module *owner) @@ -1911,6 +1928,10 @@ static inline void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struc { } +static inline void bpf_struct_ops_desc_release(struct bpf_struct_ops_desc *st_ops_desc) +{ +} + #endif #if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_LSM) diff --git a/include/linux/btf.h b/include/linux/btf.h index c9e3e0786648..91c55f350357 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -498,6 +498,8 @@ static inline void *btf_id_set8_contains(const struct btf_id_set8 *set, u32 id) bool btf_param_match_suffix(const struct btf *btf, const struct btf_param *arg, const char *suffix); +int btf_ctx_arg_offset(const struct btf *btf, const struct btf_type *func_proto, + u32 arg_no); struct bpf_verifier_log; diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index f98f580de77a..0d7be97a2411 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -116,17 +116,183 @@ static bool is_valid_value_type(struct btf *btf, s32 value_id, return true; } +#define MAYBE_NULL_SUFFIX "__nullable" +#define MAX_STUB_NAME 128 + +/* Return the type info of a stub function, if it exists. + * + * The name of a stub function is made up of the name of the struct_ops and + * the name of the function pointer member, separated by "__". For example, + * if the struct_ops type is named "foo_ops" and the function pointer + * member is named "bar", the stub function name would be "foo_ops__bar". + */ +static const struct btf_type * +find_stub_func_proto(const struct btf *btf, const char *st_op_name, + const char *member_name) +{ + char stub_func_name[MAX_STUB_NAME]; + const struct btf_type *func_type; + s32 btf_id; + int cp; + + cp = snprintf(stub_func_name, MAX_STUB_NAME, "%s__%s", + st_op_name, member_name); + if (cp >= MAX_STUB_NAME) { + pr_warn("Stub function name too long\n"); + return NULL; + } + btf_id = btf_find_by_name_kind(btf, stub_func_name, BTF_KIND_FUNC); + if (btf_id < 0) + return NULL; + func_type = btf_type_by_id(btf, btf_id); + if (!func_type) + return NULL; + + return btf_type_by_id(btf, func_type->type); /* FUNC_PROTO */ +} + +/* Prepare argument info for every nullable argument of a member of a + * struct_ops type. + * + * Initialize a struct bpf_struct_ops_arg_info according to type info of + * the arguments of a stub function. (Check kCFI for more information about + * stub functions.) + * + * Each member in the struct_ops type has a struct bpf_struct_ops_arg_info + * to provide an array of struct bpf_ctx_arg_aux, which in turn provides + * the information that used by the verifier to check the arguments of the + * BPF struct_ops program assigned to the member. Here, we only care about + * the arguments that are marked as __nullable. + * + * The array of struct bpf_ctx_arg_aux is eventually assigned to + * prog->aux->ctx_arg_info of BPF struct_ops programs and passed to the + * verifier. (See check_struct_ops_btf_id()) + * + * arg_info->info will be the list of struct bpf_ctx_arg_aux if success. If + * fails, it will be kept untouched. + */ +static int prepare_arg_info(struct btf *btf, + const char *st_ops_name, + const char *member_name, + const struct btf_type *func_proto, + struct bpf_struct_ops_arg_info *arg_info) +{ + const struct btf_type *stub_func_proto, *pointed_type; + const struct btf_param *stub_args, *args; + struct bpf_ctx_arg_aux *info, *info_buf; + u32 nargs, arg_no, info_cnt = 0; + u32 arg_btf_id; + int offset; + + stub_func_proto = find_stub_func_proto(btf, st_ops_name, member_name); + if (!stub_func_proto) + return 0; + + /* Check if the number of arguments of the stub function is the same + * as the number of arguments of the function pointer. + */ + nargs = btf_type_vlen(func_proto); + if (nargs != btf_type_vlen(stub_func_proto)) { + pr_warn("the number of arguments of the stub function %s__%s does not match the number of arguments of the member %s of struct %s\n", + st_ops_name, member_name, member_name, st_ops_name); + return -EINVAL; + } + + if (!nargs) + return 0; + + args = btf_params(func_proto); + stub_args = btf_params(stub_func_proto); + + info_buf = kcalloc(nargs, sizeof(*info_buf), GFP_KERNEL); + if (!info_buf) + return -ENOMEM; + + /* Prepare info for every nullable argument */ + info = info_buf; + for (arg_no = 0; arg_no < nargs; arg_no++) { + /* Skip arguments that is not suffixed with + * "__nullable". + */ + if (!btf_param_match_suffix(btf, &stub_args[arg_no], + MAYBE_NULL_SUFFIX)) + continue; + + /* Should be a pointer to struct */ + pointed_type = btf_type_resolve_ptr(btf, + args[arg_no].type, + &arg_btf_id); + if (!pointed_type || + !btf_type_is_struct(pointed_type)) { + pr_warn("stub function %s__%s has %s tagging to an unsupported type\n", + st_ops_name, member_name, MAYBE_NULL_SUFFIX); + goto err_out; + } + + offset = btf_ctx_arg_offset(btf, func_proto, arg_no); + if (offset < 0) { + pr_warn("stub function %s__%s has an invalid trampoline ctx offset for arg#%u\n", + st_ops_name, member_name, arg_no); + goto err_out; + } + + if (args[arg_no].type != stub_args[arg_no].type) { + pr_warn("arg#%u type in stub function %s__%s does not match with its original func_proto\n", + arg_no, st_ops_name, member_name); + goto err_out; + } + + /* Fill the information of the new argument */ + info->reg_type = + PTR_TRUSTED | PTR_TO_BTF_ID | PTR_MAYBE_NULL; + info->btf_id = arg_btf_id; + info->btf = btf; + info->offset = offset; + + info++; + info_cnt++; + } + + if (info_cnt) { + arg_info->info = info_buf; + arg_info->cnt = info_cnt; + } else { + kfree(info_buf); + } + + return 0; + +err_out: + kfree(info_buf); + + return -EINVAL; +} + +/* Clean up the arg_info in a struct bpf_struct_ops_desc. */ +void bpf_struct_ops_desc_release(struct bpf_struct_ops_desc *st_ops_desc) +{ + struct bpf_struct_ops_arg_info *arg_info; + int i; + + arg_info = st_ops_desc->arg_info; + for (i = 0; i < btf_type_vlen(st_ops_desc->type); i++) + kfree(arg_info[i].info); + + kfree(arg_info); +} + int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, struct btf *btf, struct bpf_verifier_log *log) { struct bpf_struct_ops *st_ops = st_ops_desc->st_ops; + struct bpf_struct_ops_arg_info *arg_info; const struct btf_member *member; const struct btf_type *t; s32 type_id, value_id; char value_name[128]; const char *mname; - int i; + int i, err; if (strlen(st_ops->name) + VALUE_PREFIX_LEN >= sizeof(value_name)) { @@ -160,6 +326,17 @@ int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, if (!is_valid_value_type(btf, value_id, t, value_name)) return -EINVAL; + arg_info = kcalloc(btf_type_vlen(t), sizeof(*arg_info), + GFP_KERNEL); + if (!arg_info) + return -ENOMEM; + + st_ops_desc->arg_info = arg_info; + st_ops_desc->type = t; + st_ops_desc->type_id = type_id; + st_ops_desc->value_id = value_id; + st_ops_desc->value_type = btf_type_by_id(btf, value_id); + for_each_member(i, t, member) { const struct btf_type *func_proto; @@ -167,40 +344,52 @@ int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, if (!*mname) { pr_warn("anon member in struct %s is not supported\n", st_ops->name); - return -EOPNOTSUPP; + err = -EOPNOTSUPP; + goto errout; } if (__btf_member_bitfield_size(t, member)) { pr_warn("bit field member %s in struct %s is not supported\n", mname, st_ops->name); - return -EOPNOTSUPP; + err = -EOPNOTSUPP; + goto errout; } func_proto = btf_type_resolve_func_ptr(btf, member->type, NULL); - if (func_proto && - btf_distill_func_proto(log, btf, + if (!func_proto) + continue; + + if (btf_distill_func_proto(log, btf, func_proto, mname, &st_ops->func_models[i])) { pr_warn("Error in parsing func ptr %s in struct %s\n", mname, st_ops->name); - return -EINVAL; + err = -EINVAL; + goto errout; } + + err = prepare_arg_info(btf, st_ops->name, mname, + func_proto, + arg_info + i); + if (err) + goto errout; } if (st_ops->init(btf)) { pr_warn("Error in init bpf_struct_ops %s\n", st_ops->name); - return -EINVAL; + err = -EINVAL; + goto errout; } - st_ops_desc->type_id = type_id; - st_ops_desc->type = t; - st_ops_desc->value_id = value_id; - st_ops_desc->value_type = btf_type_by_id(btf, value_id); - return 0; + +errout: + bpf_struct_ops_desc_release(st_ops_desc); + + return err; } static int bpf_struct_ops_map_get_next_key(struct bpf_map *map, void *key, diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index aa02f1dc9029..ed7d054d6e68 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -1712,6 +1712,13 @@ static void btf_free_struct_meta_tab(struct btf *btf) static void btf_free_struct_ops_tab(struct btf *btf) { struct btf_struct_ops_tab *tab = btf->struct_ops_tab; + u32 i; + + if (!tab) + return; + + for (i = 0; i < tab->cnt; i++) + bpf_struct_ops_desc_release(&tab->ops[i]); kfree(tab); btf->struct_ops_tab = NULL; @@ -6153,6 +6160,26 @@ static const struct bpf_raw_tp_null_args raw_tp_null_args[] = { { "amdtp_packet", 0x100 }, }; +int btf_ctx_arg_offset(const struct btf *btf, const struct btf_type *func_proto, + u32 arg_no) +{ + const struct btf_param *args; + const struct btf_type *t; + int off = 0, i; + u32 sz; + + args = btf_params(func_proto); + for (i = 0; i < arg_no; i++) { + t = btf_type_by_id(btf, args[i].type); + t = btf_resolve_size(btf, t, &sz); + if (IS_ERR(t)) + return PTR_ERR(t); + off += roundup(sz, 8); + } + + return off; +} + bool btf_ctx_access(int off, int size, enum bpf_access_type type, const struct bpf_prog *prog, struct bpf_insn_access_aux *info) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index bb3f3ff80d15..bfd184fc7004 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -20258,6 +20258,12 @@ static int check_struct_ops_btf_id(struct bpf_verifier_env *env) } } + /* btf_ctx_access() used this to provide argument type info */ + prog->aux->ctx_arg_info = + st_ops_desc->arg_info[member_idx].info; + prog->aux->ctx_arg_info_size = + st_ops_desc->arg_info[member_idx].cnt; + prog->aux->attach_func_proto = func_proto; prog->aux->attach_func_name = mname; env->ops = st_ops->verifier_ops; -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 00f239eccf461a6403b3c16e767d04f3954cae98 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Test if the verifier verifies nullable pointer arguments correctly for BPF struct_ops programs. "test_maybe_null" in struct bpf_testmod_ops is the operator defined for the test cases here. A BPF program should check a pointer for NULL beforehand to access the value pointed by the nullable pointer arguments, or the verifier should reject the programs. The test here includes two parts; the programs checking pointers properly and the programs not checking pointers beforehand. The test checks if the verifier accepts the programs checking properly and rejects the programs not checking at all. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240209023750.1153905-5-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/bpf_testmod/bpf_testmod.c | 13 +++++- .../selftests/bpf/bpf_testmod/bpf_testmod.h | 4 ++ .../prog_tests/test_struct_ops_maybe_null.c | 46 +++++++++++++++++++ .../bpf/progs/struct_ops_maybe_null.c | 29 ++++++++++++ .../bpf/progs/struct_ops_maybe_null_fail.c | 24 ++++++++++ 5 files changed, 115 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_maybe_null.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_maybe_null.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_maybe_null_fail.c diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index 2e1b5d8f0fa2..79238005edd1 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -550,7 +550,11 @@ static int bpf_dummy_reg(void *kdata) { struct bpf_testmod_ops *ops = kdata; - ops->test_2(4, 3); + /* Some test cases (ex. struct_ops_maybe_null) may not have test_2 + * initialized, so we need to check for NULL. + */ + if (ops->test_2) + ops->test_2(4, 3); return 0; } @@ -568,9 +572,16 @@ static void bpf_testmod_test_2(int a, int b) { } +static int bpf_testmod_ops__test_maybe_null(int dummy, + struct task_struct *task__nullable) +{ + return 0; +} + static struct bpf_testmod_ops __bpf_testmod_ops = { .test_1 = bpf_testmod_test_1, .test_2 = bpf_testmod_test_2, + .test_maybe_null = bpf_testmod_ops__test_maybe_null, }; struct bpf_struct_ops bpf_bpf_testmod_ops = { diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h index 537beca42896..c3b0cf788f9f 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h @@ -5,6 +5,8 @@ #include <linux/types.h> +struct task_struct; + struct bpf_testmod_test_read_ctx { char *buf; loff_t off; @@ -31,6 +33,8 @@ struct bpf_iter_testmod_seq { struct bpf_testmod_ops { int (*test_1)(void); void (*test_2)(int a, int b); + /* Used to test nullable arguments. */ + int (*test_maybe_null)(int dummy, struct task_struct *task); }; #endif /* _BPF_TESTMOD_H */ diff --git a/tools/testing/selftests/bpf/prog_tests/test_struct_ops_maybe_null.c b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_maybe_null.c new file mode 100644 index 000000000000..01dc2613c8a5 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_maybe_null.c @@ -0,0 +1,46 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include <test_progs.h> + +#include "struct_ops_maybe_null.skel.h" +#include "struct_ops_maybe_null_fail.skel.h" + +/* Test that the verifier accepts a program that access a nullable pointer + * with a proper check. + */ +static void maybe_null(void) +{ + struct struct_ops_maybe_null *skel; + + skel = struct_ops_maybe_null__open_and_load(); + if (!ASSERT_OK_PTR(skel, "struct_ops_module_open_and_load")) + return; + + struct_ops_maybe_null__destroy(skel); +} + +/* Test that the verifier rejects a program that access a nullable pointer + * without a check beforehand. + */ +static void maybe_null_fail(void) +{ + struct struct_ops_maybe_null_fail *skel; + + skel = struct_ops_maybe_null_fail__open_and_load(); + if (ASSERT_ERR_PTR(skel, "struct_ops_module_fail__open_and_load")) + return; + + struct_ops_maybe_null_fail__destroy(skel); +} + +void test_struct_ops_maybe_null(void) +{ + /* The verifier verifies the programs at load time, so testing both + * programs in the same compile-unit is complicated. We run them in + * separate objects to simplify the testing. + */ + if (test__start_subtest("maybe_null")) + maybe_null(); + if (test__start_subtest("maybe_null_fail")) + maybe_null_fail(); +} diff --git a/tools/testing/selftests/bpf/progs/struct_ops_maybe_null.c b/tools/testing/selftests/bpf/progs/struct_ops_maybe_null.c new file mode 100644 index 000000000000..b450f72e744a --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_maybe_null.c @@ -0,0 +1,29 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include <vmlinux.h> +#include <bpf/bpf_tracing.h> +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +pid_t tgid = 0; + +/* This is a test BPF program that uses struct_ops to access an argument + * that may be NULL. This is a test for the verifier to ensure that it can + * rip PTR_MAYBE_NULL correctly. + */ +SEC("struct_ops/test_maybe_null") +int BPF_PROG(test_maybe_null, int dummy, + struct task_struct *task) +{ + if (task) + tgid = task->tgid; + + return 0; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_1 = { + .test_maybe_null = (void *)test_maybe_null, +}; + diff --git a/tools/testing/selftests/bpf/progs/struct_ops_maybe_null_fail.c b/tools/testing/selftests/bpf/progs/struct_ops_maybe_null_fail.c new file mode 100644 index 000000000000..6283099ec383 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_maybe_null_fail.c @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include <vmlinux.h> +#include <bpf/bpf_tracing.h> +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +pid_t tgid = 0; + +SEC("struct_ops/test_maybe_null_struct_ptr") +int BPF_PROG(test_maybe_null_struct_ptr, int dummy, + struct task_struct *task) +{ + tgid = task->tgid; + + return 0; +} + +SEC(".struct_ops.link") +struct bpf_testmod_ops testmod_struct_ptr = { + .test_maybe_null = (void *)test_maybe_null_struct_ptr, +}; + -- 2.34.1
From: Benjamin Tissoires <bentiss@kernel.org> mainline inclusion from mainline-v6.9-rc1 commit dfe6625df48ec54c6dc9b86d361f26962d09de88 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- No code change, but it'll allow to have only one place to change everything when we add in_sleepable in cur_state. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240221-hid-bpf-sleepable-v3-2-1fb378ca6301@kerne... Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/verifier.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index bfd184fc7004..e8ed32672d95 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5423,6 +5423,11 @@ static int map_kptr_match_type(struct bpf_verifier_env *env, return -EINVAL; } +static bool in_sleepable(struct bpf_verifier_env *env) +{ + return env->prog->aux->sleepable; +} + /* The non-sleepable programs and sleepable programs with explicit bpf_rcu_read_lock() * can dereference RCU protected pointers and result is PTR_TRUSTED. */ @@ -5430,7 +5435,7 @@ static bool in_rcu_cs(struct bpf_verifier_env *env) { return env->cur_state->active_rcu_lock || env->cur_state->active_lock.ptr || - !env->prog->aux->sleepable; + !in_sleepable(env); } /* Once GCC supports btf_type_tag the following mechanism will be replaced with tag check */ @@ -10154,7 +10159,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn return -EINVAL; } - if (!env->prog->aux->sleepable && fn->might_sleep) { + if (!in_sleepable(env) && fn->might_sleep) { verbose(env, "helper call might sleep in a non-sleepable prog\n"); return -EINVAL; } @@ -10184,7 +10189,7 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn return -EINVAL; } - if (env->prog->aux->sleepable && is_storage_get_function(func_id)) + if (in_sleepable(env) && is_storage_get_function(func_id)) env->insn_aux_data[insn_idx].storage_get_func_atomic = true; } @@ -11479,7 +11484,7 @@ static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env) return true; fallthrough; default: - return env->prog->aux->sleepable; + return in_sleepable(env); } } @@ -12002,7 +12007,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, } sleepable = is_kfunc_sleepable(&meta); - if (sleepable && !env->prog->aux->sleepable) { + if (sleepable && !in_sleepable(env)) { verbose(env, "program must be sleepable to call sleepable kfunc %s\n", func_name); return -EACCES; } @@ -19582,7 +19587,7 @@ static int do_misc_fixups(struct bpf_verifier_env *env) } if (is_storage_get_function(insn->imm)) { - if (!env->prog->aux->sleepable || + if (!in_sleepable(env) || env->insn_aux_data[i + delta].storage_get_func_atomic) insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC); else -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 3644d285462a60c80ac225d508fcfe705640d2b4 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- For a struct_ops map, btf_value_type_id is the type ID of it's struct type. This value is required by bpftool to generate skeleton including pointers of shadow types. The code generator gets the type ID from bpf_map__btf_value_type_id() in order to get the type information of the struct type of a map. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240229064523.2091270-2-thinker.li@gmail.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/libbpf.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 072f72c1b40c..dc3aecf1f070 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -1214,6 +1214,7 @@ static int init_struct_ops_maps(struct bpf_object *obj, const char *sec_name, map->name = strdup(var_name); if (!map->name) return -ENOMEM; + map->btf_value_type_id = type_id; map->def.type = BPF_MAP_TYPE_STRUCT_OPS; map->def.key_size = sizeof(int); @@ -5193,6 +5194,10 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b create_attr.btf_value_type_id = 0; map->btf_key_type_id = 0; map->btf_value_type_id = 0; + break; + case BPF_MAP_TYPE_STRUCT_OPS: + create_attr.btf_value_type_id = 0; + break; default: break; } -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 69e4a9d2b3f5adf5af4feeab0a9f505da971265a category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Convert st_ops->data to the shadow type of the struct_ops map. The shadow type of a struct_ops type is a variant of the original struct type providing a way to access/change the values in the maps of the struct_ops type. bpf_map__initial_value() will return st_ops->data for struct_ops types. The skeleton is going to use it as the pointer to the shadow type of the original struct type. One of the main differences between the original struct type and the shadow type is that all function pointers of the shadow type are converted to pointers of struct bpf_program. Users can replace these bpf_program pointers with other BPF programs. The st_ops->progs[] will be updated before updating the value of a map to reflect the changes made by users. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240229064523.2091270-3-thinker.li@gmail.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/libbpf.c | 40 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 38 insertions(+), 2 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index dc3aecf1f070..5f8f0dd589e7 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -999,6 +999,19 @@ static bool bpf_map__is_struct_ops(const struct bpf_map *map) return map->def.type == BPF_MAP_TYPE_STRUCT_OPS; } +static bool is_valid_st_ops_program(struct bpf_object *obj, + const struct bpf_program *prog) +{ + int i; + + for (i = 0; i < obj->nr_programs; i++) { + if (&obj->programs[i] == prog) + return prog->type == BPF_PROG_TYPE_STRUCT_OPS; + } + + return false; +} + /* Init the map's fields that depend on kern_btf */ static int bpf_map__init_kern_struct_ops(struct bpf_map *map) { @@ -1087,9 +1100,16 @@ static int bpf_map__init_kern_struct_ops(struct bpf_map *map) if (btf_is_ptr(mtype)) { struct bpf_program *prog; - prog = st_ops->progs[i]; + /* Update the value from the shadow type */ + prog = *(void **)mdata; + st_ops->progs[i] = prog; if (!prog) continue; + if (!is_valid_st_ops_program(obj, prog)) { + pr_warn("struct_ops init_kern %s: member %s is not a struct_ops program\n", + map->name, mname); + return -ENOTSUP; + } kern_mtype = skip_mods_and_typedefs(kern_btf, kern_mtype->type, @@ -9173,7 +9193,9 @@ static struct bpf_map *find_struct_ops_map_by_offset(struct bpf_object *obj, return NULL; } -/* Collect the reloc from ELF and populate the st_ops->progs[] */ +/* Collect the reloc from ELF, populate the st_ops->progs[], and update + * st_ops->data for shadow type. + */ static int bpf_object__collect_st_ops_relos(struct bpf_object *obj, Elf64_Shdr *shdr, Elf_Data *data) { @@ -9287,6 +9309,14 @@ static int bpf_object__collect_st_ops_relos(struct bpf_object *obj, } st_ops->progs[member_idx] = prog; + + /* st_ops->data will be exposed to users, being returned by + * bpf_map__initial_value() as a pointer to the shadow + * type. All function pointers in the original struct type + * should be converted to a pointer to struct bpf_program + * in the shadow type. + */ + *((struct bpf_program **)(st_ops->data + moff)) = prog; } return 0; @@ -9743,6 +9773,12 @@ int bpf_map__set_initial_value(struct bpf_map *map, void *bpf_map__initial_value(struct bpf_map *map, size_t *psize) { + if (bpf_map__is_struct_ops(map)) { + if (psize) + *psize = map->def.value_size; + return map->st_ops->data; + } + if (!map->mmaped) return NULL; *psize = map->def.value_size; -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit a7b0fa352eafef95bd0d736ca94965d3f884ad18 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Declares and defines a pointer of the shadow type for each struct_ops map. The code generator will create an anonymous struct type as the shadow type for each struct_ops map. The shadow type is translated from the original struct type of the map. The user of the skeleton use pointers of them to access the values of struct_ops maps. However, shadow types only supports certain types of fields, including scalar types and function pointers. Any fields of unsupported types are translated into an array of characters to occupy the space of the original field. Function pointers are translated into pointers of the struct bpf_program. Additionally, padding fields are generated to occupy the space between two consecutive fields. The pointers of shadow types of struct_osp maps are initialized when *__open_opts() in skeletons are called. For a map called FOO, the user can access it through the pointer at skel->struct_ops.FOO. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20240229064523.2091270-4-thinker.li@gmail.com Conflicts: tools/bpf/bpftool/gen.c [Fix conflict because of context diff.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/bpf/bpftool/gen.c | 237 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 236 insertions(+), 1 deletion(-) diff --git a/tools/bpf/bpftool/gen.c b/tools/bpf/bpftool/gen.c index 882bf8e6e70e..34515fbe9414 100644 --- a/tools/bpf/bpftool/gen.c +++ b/tools/bpf/bpftool/gen.c @@ -54,6 +54,20 @@ static bool str_has_suffix(const char *str, const char *suffix) return true; } +static const struct btf_type * +resolve_func_ptr(const struct btf *btf, __u32 id, __u32 *res_id) +{ + const struct btf_type *t; + + t = skip_mods_and_typedefs(btf, id, NULL); + if (!btf_is_ptr(t)) + return NULL; + + t = skip_mods_and_typedefs(btf, t->type, res_id); + + return btf_is_func_proto(t) ? t : NULL; +} + static void get_obj_name(char *name, const char *file) { /* Using basename() GNU version which doesn't modify arg. */ @@ -903,6 +917,207 @@ codegen_progs_skeleton(struct bpf_object *obj, size_t prog_cnt, bool populate_li } } +static int walk_st_ops_shadow_vars(struct btf *btf, const char *ident, + const struct btf_type *map_type, __u32 map_type_id) +{ + LIBBPF_OPTS(btf_dump_emit_type_decl_opts, opts, .indent_level = 3); + const struct btf_type *member_type; + __u32 offset, next_offset = 0; + const struct btf_member *m; + struct btf_dump *d = NULL; + const char *member_name; + __u32 member_type_id; + int i, err = 0, n; + int size; + + d = btf_dump__new(btf, codegen_btf_dump_printf, NULL, NULL); + if (!d) + return -errno; + + n = btf_vlen(map_type); + for (i = 0, m = btf_members(map_type); i < n; i++, m++) { + member_type = skip_mods_and_typedefs(btf, m->type, &member_type_id); + member_name = btf__name_by_offset(btf, m->name_off); + + offset = m->offset / 8; + if (next_offset < offset) + printf("\t\t\tchar __padding_%d[%d];\n", i, offset - next_offset); + + switch (btf_kind(member_type)) { + case BTF_KIND_INT: + case BTF_KIND_FLOAT: + case BTF_KIND_ENUM: + case BTF_KIND_ENUM64: + /* scalar type */ + printf("\t\t\t"); + opts.field_name = member_name; + err = btf_dump__emit_type_decl(d, member_type_id, &opts); + if (err) { + p_err("Failed to emit type declaration for %s: %d", member_name, err); + goto out; + } + printf(";\n"); + + size = btf__resolve_size(btf, member_type_id); + if (size < 0) { + p_err("Failed to resolve size of %s: %d\n", member_name, size); + err = size; + goto out; + } + + next_offset = offset + size; + break; + + case BTF_KIND_PTR: + if (resolve_func_ptr(btf, m->type, NULL)) { + /* Function pointer */ + printf("\t\t\tstruct bpf_program *%s;\n", member_name); + + next_offset = offset + sizeof(void *); + break; + } + /* All pointer types are unsupported except for + * function pointers. + */ + fallthrough; + + default: + /* Unsupported types + * + * Types other than scalar types and function + * pointers are currently not supported in order to + * prevent conflicts in the generated code caused + * by multiple definitions. For instance, if the + * struct type FOO is used in a struct_ops map, + * bpftool has to generate definitions for FOO, + * which may result in conflicts if FOO is defined + * in different skeleton files. + */ + size = btf__resolve_size(btf, member_type_id); + if (size < 0) { + p_err("Failed to resolve size of %s: %d\n", member_name, size); + err = size; + goto out; + } + printf("\t\t\tchar __unsupported_%d[%d];\n", i, size); + + next_offset = offset + size; + break; + } + } + + /* Cannot fail since it must be a struct type */ + size = btf__resolve_size(btf, map_type_id); + if (next_offset < (__u32)size) + printf("\t\t\tchar __padding_end[%d];\n", size - next_offset); + +out: + btf_dump__free(d); + + return err; +} + +/* Generate the pointer of the shadow type for a struct_ops map. + * + * This function adds a pointer of the shadow type for a struct_ops map. + * The members of a struct_ops map can be exported through a pointer to a + * shadow type. The user can access these members through the pointer. + * + * A shadow type includes not all members, only members of some types. + * They are scalar types and function pointers. The function pointers are + * translated to the pointer of the struct bpf_program. The scalar types + * are translated to the original type without any modifiers. + * + * Unsupported types will be translated to a char array to occupy the same + * space as the original field, being renamed as __unsupported_*. The user + * should treat these fields as opaque data. + */ +static int gen_st_ops_shadow_type(const char *obj_name, struct btf *btf, const char *ident, + const struct bpf_map *map) +{ + const struct btf_type *map_type; + const char *type_name; + __u32 map_type_id; + int err; + + map_type_id = bpf_map__btf_value_type_id(map); + if (map_type_id == 0) + return -EINVAL; + map_type = btf__type_by_id(btf, map_type_id); + if (!map_type) + return -EINVAL; + + type_name = btf__name_by_offset(btf, map_type->name_off); + + printf("\t\tstruct %s__%s__%s {\n", obj_name, ident, type_name); + + err = walk_st_ops_shadow_vars(btf, ident, map_type, map_type_id); + if (err) + return err; + + printf("\t\t} *%s;\n", ident); + + return 0; +} + +static int gen_st_ops_shadow(const char *obj_name, struct btf *btf, struct bpf_object *obj) +{ + int err, st_ops_cnt = 0; + struct bpf_map *map; + char ident[256]; + + if (!btf) + return 0; + + /* Generate the pointers to shadow types of + * struct_ops maps. + */ + bpf_object__for_each_map(map, obj) { + if (bpf_map__type(map) != BPF_MAP_TYPE_STRUCT_OPS) + continue; + if (!get_map_ident(map, ident, sizeof(ident))) + continue; + + if (st_ops_cnt == 0) /* first struct_ops map */ + printf("\tstruct {\n"); + st_ops_cnt++; + + err = gen_st_ops_shadow_type(obj_name, btf, ident, map); + if (err) + return err; + } + + if (st_ops_cnt) + printf("\t} struct_ops;\n"); + + return 0; +} + +/* Generate the code to initialize the pointers of shadow types. */ +static void gen_st_ops_shadow_init(struct btf *btf, struct bpf_object *obj) +{ + struct bpf_map *map; + char ident[256]; + + if (!btf) + return; + + /* Initialize the pointers to_ops shadow types of + * struct_ops maps. + */ + bpf_object__for_each_map(map, obj) { + if (bpf_map__type(map) != BPF_MAP_TYPE_STRUCT_OPS) + continue; + if (!get_map_ident(map, ident, sizeof(ident))) + continue; + codegen("\ + \n\ + obj->struct_ops.%1$s = bpf_map__initial_value(obj->maps.%1$s, NULL);\n\ + \n\ + ", ident); + } +} + static int do_skeleton(int argc, char **argv) { char header_guard[MAX_OBJ_NAME_LEN + sizeof("__SKEL_H__")]; @@ -1046,6 +1261,11 @@ static int do_skeleton(int argc, char **argv) printf("\t} maps;\n"); } + btf = bpf_object__btf(obj); + err = gen_st_ops_shadow(obj_name, btf, obj); + if (err) + goto out; + if (prog_cnt) { printf("\tstruct {\n"); bpf_object__for_each_program(prog, obj) { @@ -1069,7 +1289,6 @@ static int do_skeleton(int argc, char **argv) printf("\t} links;\n"); } - btf = bpf_object__btf(obj); if (btf) { err = codegen_datasecs(obj, obj_name); if (err) @@ -1127,6 +1346,12 @@ static int do_skeleton(int argc, char **argv) if (err) \n\ goto err_out; \n\ \n\ + ", obj_name); + + gen_st_ops_shadow_init(btf, obj); + + codegen("\ + \n\ return obj; \n\ err_out: \n\ %1$s__destroy(obj); \n\ @@ -1436,6 +1661,10 @@ static int do_subskeleton(int argc, char **argv) printf("\t} maps;\n"); } + err = gen_st_ops_shadow(obj_name, btf, obj); + if (err) + goto out; + if (prog_cnt) { printf("\tstruct {\n"); bpf_object__for_each_program(prog, obj) { @@ -1547,6 +1776,12 @@ static int do_subskeleton(int argc, char **argv) if (err) \n\ goto err; \n\ \n\ + "); + + gen_st_ops_shadow_init(btf, obj); + + codegen("\ + \n\ return obj; \n\ err: \n\ %1$s__destroy(obj); \n\ -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit f2e81192e07e87897ff1296c96775eceea8f582a category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- The example in bpftool-gen.8 explains how to use the pointer of the shadow type to change the value of a field of a struct_ops map. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20240229064523.2091270-5-thinker.li@gmail.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../bpf/bpftool/Documentation/bpftool-gen.rst | 58 +++++++++++++++++-- 1 file changed, 52 insertions(+), 6 deletions(-) diff --git a/tools/bpf/bpftool/Documentation/bpftool-gen.rst b/tools/bpf/bpftool/Documentation/bpftool-gen.rst index 5006e724d1bc..5e60825818dd 100644 --- a/tools/bpf/bpftool/Documentation/bpftool-gen.rst +++ b/tools/bpf/bpftool/Documentation/bpftool-gen.rst @@ -257,18 +257,48 @@ EXAMPLES return 0; } -This is example BPF application with two BPF programs and a mix of BPF maps -and global variables. Source code is split across two source code files. +**$ cat example3.bpf.c** + +:: + + #include <linux/ptrace.h> + #include <linux/bpf.h> + #include <bpf/bpf_helpers.h> + /* This header file is provided by the bpf_testmod module. */ + #include "bpf_testmod.h" + + int test_2_result = 0; + + /* bpf_Testmod.ko calls this function, passing a "4" + * and testmod_map->data. + */ + SEC("struct_ops/test_2") + void BPF_PROG(test_2, int a, int b) + { + test_2_result = a + b; + } + + SEC(".struct_ops") + struct bpf_testmod_ops testmod_map = { + .test_2 = (void *)test_2, + .data = 0x1, + }; + +This is example BPF application with three BPF programs and a mix of BPF +maps and global variables. Source code is split across three source code +files. **$ clang --target=bpf -g example1.bpf.c -o example1.bpf.o** **$ clang --target=bpf -g example2.bpf.c -o example2.bpf.o** -**$ bpftool gen object example.bpf.o example1.bpf.o example2.bpf.o** +**$ clang --target=bpf -g example3.bpf.c -o example3.bpf.o** + +**$ bpftool gen object example.bpf.o example1.bpf.o example2.bpf.o example3.bpf.o** -This set of commands compiles *example1.bpf.c* and *example2.bpf.c* -individually and then statically links respective object files into the final -BPF ELF object file *example.bpf.o*. +This set of commands compiles *example1.bpf.c*, *example2.bpf.c* and +*example3.bpf.c* individually and then statically links respective object +files into the final BPF ELF object file *example.bpf.o*. **$ bpftool gen skeleton example.bpf.o name example | tee example.skel.h** @@ -291,7 +321,15 @@ BPF ELF object file *example.bpf.o*. struct bpf_map *data; struct bpf_map *bss; struct bpf_map *my_map; + struct bpf_map *testmod_map; } maps; + struct { + struct example__testmod_map__bpf_testmod_ops { + const struct bpf_program *test_1; + const struct bpf_program *test_2; + int data; + } *testmod_map; + } struct_ops; struct { struct bpf_program *handle_sys_enter; struct bpf_program *handle_sys_exit; @@ -304,6 +342,7 @@ BPF ELF object file *example.bpf.o*. struct { int x; } data; + int test_2_result; } *bss; struct example__data { _Bool global_flag; @@ -342,10 +381,16 @@ BPF ELF object file *example.bpf.o*. skel->rodata->param1 = 128; + /* Change the value through the pointer of shadow type */ + skel->struct_ops.testmod_map->data = 13; + err = example__load(skel); if (err) goto cleanup; + /* The result of the function test_2() */ + printf("test_2_result: %d\n", skel->bss->test_2_result); + err = example__attach(skel); if (err) goto cleanup; @@ -372,6 +417,7 @@ BPF ELF object file *example.bpf.o*. :: + test_2_result: 17 my_map name: my_map sys_enter prog FD: 8 my_static_var: 7 -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 0623e73317940d052216fb6eef4efd55a0a7f602 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Change the values of fields, including scalar types and function pointers, and check if the struct_ops map works as expected. The test changes the field "test_2" of "testmod_1" from the pointer to test_2() to pointer to test_3() and the field "data" to 13. The function test_2() and test_3() both compute a new value for "test_2_result", but in different way. By checking the value of "test_2_result", it ensures the struct_ops map works as expected with changes through shadow types. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240229064523.2091270-6-thinker.li@gmail.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/bpf_testmod/bpf_testmod.c | 11 ++++++++++- .../selftests/bpf/bpf_testmod/bpf_testmod.h | 8 ++++++++ .../bpf/prog_tests/test_struct_ops_module.c | 19 +++++++++++++++---- .../selftests/bpf/progs/struct_ops_module.c | 8 ++++++++ 4 files changed, 41 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index 79238005edd1..2c5c64950781 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -534,6 +534,15 @@ static int bpf_testmod_ops_init_member(const struct btf_type *t, const struct btf_member *member, void *kdata, const void *udata) { + if (member->offset == offsetof(struct bpf_testmod_ops, data) * 8) { + /* For data fields, this function has to copy it and return + * 1 to indicate that the data has been handled by the + * struct_ops type, or the verifier will reject the map if + * the value of the data field is not zero. + */ + ((struct bpf_testmod_ops *)kdata)->data = ((struct bpf_testmod_ops *)udata)->data; + return 1; + } return 0; } @@ -554,7 +563,7 @@ static int bpf_dummy_reg(void *kdata) * initialized, so we need to check for NULL. */ if (ops->test_2) - ops->test_2(4, 3); + ops->test_2(4, ops->data); return 0; } diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h index c3b0cf788f9f..971458acfac3 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h @@ -35,6 +35,14 @@ struct bpf_testmod_ops { void (*test_2)(int a, int b); /* Used to test nullable arguments. */ int (*test_maybe_null)(int dummy, struct task_struct *task); + + /* The following fields are used to test shadow copies. */ + char onebyte; + struct { + int a; + int b; + } unsupported; + int data; }; #endif /* _BPF_TESTMOD_H */ diff --git a/tools/testing/selftests/bpf/prog_tests/test_struct_ops_module.c b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_module.c index 8d833f0c7580..7d6facf46ebb 100644 --- a/tools/testing/selftests/bpf/prog_tests/test_struct_ops_module.c +++ b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_module.c @@ -32,17 +32,23 @@ static void check_map_info(struct bpf_map_info *info) static void test_struct_ops_load(void) { - DECLARE_LIBBPF_OPTS(bpf_object_open_opts, opts); struct struct_ops_module *skel; struct bpf_map_info info = {}; struct bpf_link *link; int err; u32 len; - skel = struct_ops_module__open_opts(&opts); + skel = struct_ops_module__open(); if (!ASSERT_OK_PTR(skel, "struct_ops_module_open")) return; + skel->struct_ops.testmod_1->data = 13; + skel->struct_ops.testmod_1->test_2 = skel->progs.test_3; + /* Since test_2() is not being used, it should be disabled from + * auto-loading, or it will fail to load. + */ + bpf_program__set_autoload(skel->progs.test_2, false); + err = struct_ops_module__load(skel); if (!ASSERT_OK(err, "struct_ops_module_load")) goto cleanup; @@ -56,8 +62,13 @@ static void test_struct_ops_load(void) link = bpf_map__attach_struct_ops(skel->maps.testmod_1); ASSERT_OK_PTR(link, "attach_test_mod_1"); - /* test_2() will be called from bpf_dummy_reg() in bpf_testmod.c */ - ASSERT_EQ(skel->bss->test_2_result, 7, "test_2_result"); + /* test_3() will be called from bpf_dummy_reg() in bpf_testmod.c + * + * In bpf_testmod.c it will pass 4 and 13 (the value of data) to + * .test_2. So, the value of test_2_result should be 20 (4 + 13 + + * 3). + */ + ASSERT_EQ(skel->bss->test_2_result, 20, "check_shadow_variables"); bpf_link__destroy(link); diff --git a/tools/testing/selftests/bpf/progs/struct_ops_module.c b/tools/testing/selftests/bpf/progs/struct_ops_module.c index b78746b3cef3..25952fa09348 100644 --- a/tools/testing/selftests/bpf/progs/struct_ops_module.c +++ b/tools/testing/selftests/bpf/progs/struct_ops_module.c @@ -21,9 +21,17 @@ void BPF_PROG(test_2, int a, int b) test_2_result = a + b; } +SEC("struct_ops/test_3") +int BPF_PROG(test_3, int a, int b) +{ + test_2_result = a + b + 3; + return a + b + 3; +} + SEC(".struct_ops.link") struct bpf_testmod_ops testmod_1 = { .test_1 = (void *)test_1, .test_2 = (void *)test_2, + .data = 0x1, }; -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 73e4f9e615d7b99f39663d4722dc73e8fa5db5f9 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Perform all validations when updating values of struct_ops maps. Doing validation in st_ops->reg() and st_ops->update() is not necessary anymore. However, tcp_register_congestion_control() has been called in various places. It still needs to do validations. Cc: netdev@vger.kernel.org Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240224223418.526631-2-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/bpf_struct_ops.c | 11 ++++++----- net/ipv4/tcp_cong.c | 6 +----- 2 files changed, 7 insertions(+), 10 deletions(-) diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 0d7be97a2411..c244ed5114fd 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -667,13 +667,14 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, *(unsigned long *)(udata + moff) = prog->aux->id; } + if (st_ops->validate) { + err = st_ops->validate(kdata); + if (err) + goto reset_unlock; + } + if (st_map->map.map_flags & BPF_F_LINK) { err = 0; - if (st_ops->validate) { - err = st_ops->validate(kdata); - if (err) - goto reset_unlock; - } arch_protect_bpf_trampoline(st_map->image, PAGE_SIZE); /* Let bpf_link handle registration & unregistration. * diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c index 95dbb2799be4..48617d99abb0 100644 --- a/net/ipv4/tcp_cong.c +++ b/net/ipv4/tcp_cong.c @@ -145,11 +145,7 @@ EXPORT_SYMBOL_GPL(tcp_unregister_congestion_control); int tcp_update_congestion_control(struct tcp_congestion_ops *ca, struct tcp_congestion_ops *old_ca) { struct tcp_congestion_ops *existing; - int ret; - - ret = tcp_validate_congestion_control(ca); - if (ret) - return ret; + int ret = 0; ca->key = jhash(ca->name, sizeof(ca->name), strlen(ca->name)); -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 187e2af05abe6bf80581490239c449456627d17a category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- The BPF struct_ops previously only allowed one page of trampolines. Each function pointer of a struct_ops is implemented by a struct_ops bpf program. Each struct_ops bpf program requires a trampoline. The following selftest patch shows each page can hold a little more than 20 trampolines. While one page is more than enough for the tcp-cc usecase, the sched_ext use case shows that one page is not always enough and hits the one page limit. This patch overcomes the one page limit by allocating another page when needed and it is limited to a total of MAX_IMAGE_PAGES (8) pages which is more than enough for reasonable usages. The variable st_map->image has been changed to st_map->image_pages, and its type has been changed to an array of pointers to pages. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240224223418.526631-3-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf.h | 4 +- kernel/bpf/bpf_struct_ops.c | 130 ++++++++++++++++++++++----------- net/bpf/bpf_dummy_struct_ops.c | 12 +-- 3 files changed, 96 insertions(+), 50 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 6bb262b615ff..1c57b5ac2678 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1866,7 +1866,9 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks, struct bpf_tramp_link *link, const struct btf_func_model *model, void *stub_func, - void *image, void *image_end); + void **image, u32 *image_off, + bool allow_alloc); +void bpf_struct_ops_image_free(void *image); static inline bool bpf_try_module_get(const void *data, struct module *owner) { if (owner == BPF_MODULE_OWNER) diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index c244ed5114fd..cb6deeead404 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -18,6 +18,8 @@ struct bpf_struct_ops_value { char data[] ____cacheline_aligned_in_smp; }; +#define MAX_TRAMP_IMAGE_PAGES 8 + struct bpf_struct_ops_map { struct bpf_map map; struct rcu_head rcu; @@ -30,12 +32,11 @@ struct bpf_struct_ops_map { */ struct bpf_link **links; u32 links_cnt; - /* image is a page that has all the trampolines + u32 image_pages_cnt; + /* image_pages is an array of pages that has all the trampolines * that stores the func args before calling the bpf_prog. - * A PAGE_SIZE "image" is enough to store all trampoline for - * "links[]". */ - void *image; + void *image_pages[MAX_TRAMP_IMAGE_PAGES]; /* The owner moduler's btf. */ struct btf *btf; /* uvalue->data stores the kernel struct @@ -116,6 +117,31 @@ static bool is_valid_value_type(struct btf *btf, s32 value_id, return true; } +static void *bpf_struct_ops_image_alloc(void) +{ + void *image; + int err; + + err = bpf_jit_charge_modmem(PAGE_SIZE); + if (err) + return ERR_PTR(err); + image = arch_alloc_bpf_trampoline(PAGE_SIZE); + if (!image) { + bpf_jit_uncharge_modmem(PAGE_SIZE); + return ERR_PTR(-ENOMEM); + } + + return image; +} + +void bpf_struct_ops_image_free(void *image) +{ + if (image) { + arch_free_bpf_trampoline(image, PAGE_SIZE); + bpf_jit_uncharge_modmem(PAGE_SIZE); + } +} + #define MAYBE_NULL_SUFFIX "__nullable" #define MAX_STUB_NAME 128 @@ -456,6 +482,15 @@ static void bpf_struct_ops_map_put_progs(struct bpf_struct_ops_map *st_map) } } +static void bpf_struct_ops_map_free_image(struct bpf_struct_ops_map *st_map) +{ + int i; + + for (i = 0; i < st_map->image_pages_cnt; i++) + bpf_struct_ops_image_free(st_map->image_pages[i]); + st_map->image_pages_cnt = 0; +} + static int check_zero_holes(const struct btf *btf, const struct btf_type *t, void *data) { const struct btf_member *member; @@ -501,9 +536,12 @@ const struct bpf_link_ops bpf_struct_ops_link_lops = { int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks, struct bpf_tramp_link *link, const struct btf_func_model *model, - void *stub_func, void *image, void *image_end) + void *stub_func, + void **_image, u32 *_image_off, + bool allow_alloc) { - u32 flags = BPF_TRAMP_F_INDIRECT; + u32 image_off = *_image_off, flags = BPF_TRAMP_F_INDIRECT; + void *image = *_image; int size; tlinks[BPF_TRAMP_FENTRY].links[0] = link; @@ -513,12 +551,32 @@ int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks, flags |= BPF_TRAMP_F_RET_FENTRY_RET; size = arch_bpf_trampoline_size(model, flags, tlinks, NULL); - if (size < 0) - return size; - if (size > (unsigned long)image_end - (unsigned long)image) - return -E2BIG; - return arch_prepare_bpf_trampoline(NULL, image, image_end, + if (size <= 0) + return size ? : -EFAULT; + + /* Allocate image buffer if necessary */ + if (!image || size > PAGE_SIZE - image_off) { + if (!allow_alloc) + return -E2BIG; + + image = bpf_struct_ops_image_alloc(); + if (IS_ERR(image)) + return PTR_ERR(image); + image_off = 0; + } + + size = arch_prepare_bpf_trampoline(NULL, image + image_off, + image + PAGE_SIZE, model, flags, tlinks, stub_func); + if (size <= 0) { + if (image != *_image) + bpf_struct_ops_image_free(image); + return size ? : -EFAULT; + } + + *_image = image; + *_image_off = image_off + size; + return 0; } static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, @@ -534,8 +592,8 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, struct bpf_tramp_links *tlinks; void *udata, *kdata; int prog_fd, err; - void *image, *image_end; - u32 i; + u32 i, trampoline_start, image_off = 0; + void *cur_image = NULL, *image = NULL; if (flags) return -EINVAL; @@ -573,8 +631,6 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, udata = &uvalue->data; kdata = &kvalue->data; - image = st_map->image; - image_end = st_map->image + PAGE_SIZE; module_type = btf_type_by_id(btf_vmlinux, st_ops_ids[IDX_MODULE_ID]); for_each_member(i, t, member) { @@ -653,15 +709,24 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, &bpf_struct_ops_link_lops, prog); st_map->links[i] = &link->link; + trampoline_start = image_off; err = bpf_struct_ops_prepare_trampoline(tlinks, link, - &st_ops->func_models[i], - *(void **)(st_ops->cfi_stubs + moff), - image, image_end); + &st_ops->func_models[i], + *(void **)(st_ops->cfi_stubs + moff), + &image, &image_off, + st_map->image_pages_cnt < MAX_TRAMP_IMAGE_PAGES); + if (err) + goto reset_unlock; + + if (cur_image != image) { + st_map->image_pages[st_map->image_pages_cnt++] = image; + cur_image = image; + trampoline_start = 0; + } if (err < 0) goto reset_unlock; - *(void **)(kdata + moff) = image + cfi_get_offset(); - image += err; + *(void **)(kdata + moff) = image + trampoline_start + cfi_get_offset(); /* put prog_id to udata */ *(unsigned long *)(udata + moff) = prog->aux->id; @@ -672,10 +737,11 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, if (err) goto reset_unlock; } + for (i = 0; i < st_map->image_pages_cnt; i++) + arch_protect_bpf_trampoline(st_map->image_pages[i], PAGE_SIZE); if (st_map->map.map_flags & BPF_F_LINK) { err = 0; - arch_protect_bpf_trampoline(st_map->image, PAGE_SIZE); /* Let bpf_link handle registration & unregistration. * * Pair with smp_load_acquire() during lookup_elem(). @@ -684,7 +750,6 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, goto unlock; } - arch_protect_bpf_trampoline(st_map->image, PAGE_SIZE); err = st_ops->reg(kdata); if (likely(!err)) { /* This refcnt increment on the map here after @@ -707,9 +772,9 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, * there was a race in registering the struct_ops (under the same name) to * a sub-system through different struct_ops's maps. */ - arch_unprotect_bpf_trampoline(st_map->image, PAGE_SIZE); reset_unlock: + bpf_struct_ops_map_free_image(st_map); bpf_struct_ops_map_put_progs(st_map); memset(uvalue, 0, map->value_size); memset(kvalue, 0, map->value_size); @@ -776,10 +841,7 @@ static void __bpf_struct_ops_map_free(struct bpf_map *map) if (st_map->links) bpf_struct_ops_map_put_progs(st_map); bpf_map_area_free(st_map->links); - if (st_map->image) { - arch_free_bpf_trampoline(st_map->image, PAGE_SIZE); - bpf_jit_uncharge_modmem(PAGE_SIZE); - } + bpf_struct_ops_map_free_image(st_map); bpf_map_area_free(st_map->uvalue); bpf_map_area_free(st_map); } @@ -889,20 +951,6 @@ static struct bpf_map *bpf_struct_ops_map_alloc(union bpf_attr *attr) st_map->st_ops_desc = st_ops_desc; map = &st_map->map; - ret = bpf_jit_charge_modmem(PAGE_SIZE); - if (ret) - goto errout_free; - - st_map->image = arch_alloc_bpf_trampoline(PAGE_SIZE); - if (!st_map->image) { - /* __bpf_struct_ops_map_free() uses st_map->image as flag - * for "charged or not". In this case, we need to unchange - * here. - */ - bpf_jit_uncharge_modmem(PAGE_SIZE); - ret = -ENOMEM; - goto errout_free; - } st_map->uvalue = bpf_map_area_alloc(vt->size, NUMA_NO_NODE); st_map->links_cnt = btf_type_vlen(t); st_map->links = diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c index 02de71719aed..1b5f812e6972 100644 --- a/net/bpf/bpf_dummy_struct_ops.c +++ b/net/bpf/bpf_dummy_struct_ops.c @@ -91,6 +91,7 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, struct bpf_tramp_link *link = NULL; void *image = NULL; unsigned int op_idx; + u32 image_off = 0; int prog_ret; s32 type_id; int err; @@ -114,12 +115,6 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, goto out; } - image = arch_alloc_bpf_trampoline(PAGE_SIZE); - if (!image) { - err = -ENOMEM; - goto out; - } - link = kzalloc(sizeof(*link), GFP_USER); if (!link) { err = -ENOMEM; @@ -133,7 +128,8 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, err = bpf_struct_ops_prepare_trampoline(tlinks, link, &st_ops->func_models[op_idx], &dummy_ops_test_ret_function, - image, image + PAGE_SIZE); + &image, &image_off, + true); if (err < 0) goto out; @@ -147,7 +143,7 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, err = -EFAULT; out: kfree(args); - arch_free_bpf_trampoline(image, PAGE_SIZE); + bpf_struct_ops_image_free(image); if (link) bpf_link_put(&link->link); kfree(tlinks); -- 2.34.1
From: Kui-Feng Lee <thinker.li@gmail.com> mainline inclusion from mainline-v6.9-rc1 commit 93bc28d859e57f1a654d3b63600d14c85c5630a4 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Create and load a struct_ops map with a large number of struct_ops programs to generate trampolines taking a size over multiple pages. The map includes 40 programs. Their trampolines takes 6.6k+, more than 1.5 pages, on x86. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240224223418.526631-4-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/bpf_testmod/bpf_testmod.h | 44 ++++++++ .../prog_tests/test_struct_ops_multi_pages.c | 30 ++++++ .../bpf/progs/struct_ops_multi_pages.c | 102 ++++++++++++++++++ 3 files changed, 176 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/test_struct_ops_multi_pages.c create mode 100644 tools/testing/selftests/bpf/progs/struct_ops_multi_pages.c diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h index 971458acfac3..622d24e29d35 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.h @@ -43,6 +43,50 @@ struct bpf_testmod_ops { int b; } unsupported; int data; + + /* The following pointers are used to test the maps having multiple + * pages of trampolines. + */ + int (*tramp_1)(int value); + int (*tramp_2)(int value); + int (*tramp_3)(int value); + int (*tramp_4)(int value); + int (*tramp_5)(int value); + int (*tramp_6)(int value); + int (*tramp_7)(int value); + int (*tramp_8)(int value); + int (*tramp_9)(int value); + int (*tramp_10)(int value); + int (*tramp_11)(int value); + int (*tramp_12)(int value); + int (*tramp_13)(int value); + int (*tramp_14)(int value); + int (*tramp_15)(int value); + int (*tramp_16)(int value); + int (*tramp_17)(int value); + int (*tramp_18)(int value); + int (*tramp_19)(int value); + int (*tramp_20)(int value); + int (*tramp_21)(int value); + int (*tramp_22)(int value); + int (*tramp_23)(int value); + int (*tramp_24)(int value); + int (*tramp_25)(int value); + int (*tramp_26)(int value); + int (*tramp_27)(int value); + int (*tramp_28)(int value); + int (*tramp_29)(int value); + int (*tramp_30)(int value); + int (*tramp_31)(int value); + int (*tramp_32)(int value); + int (*tramp_33)(int value); + int (*tramp_34)(int value); + int (*tramp_35)(int value); + int (*tramp_36)(int value); + int (*tramp_37)(int value); + int (*tramp_38)(int value); + int (*tramp_39)(int value); + int (*tramp_40)(int value); }; #endif /* _BPF_TESTMOD_H */ diff --git a/tools/testing/selftests/bpf/prog_tests/test_struct_ops_multi_pages.c b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_multi_pages.c new file mode 100644 index 000000000000..645d32b5160c --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/test_struct_ops_multi_pages.c @@ -0,0 +1,30 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include <test_progs.h> + +#include "struct_ops_multi_pages.skel.h" + +static void do_struct_ops_multi_pages(void) +{ + struct struct_ops_multi_pages *skel; + struct bpf_link *link; + + /* The size of all trampolines of skel->maps.multi_pages should be + * over 1 page (at least for x86). + */ + skel = struct_ops_multi_pages__open_and_load(); + if (!ASSERT_OK_PTR(skel, "struct_ops_multi_pages_open_and_load")) + return; + + link = bpf_map__attach_struct_ops(skel->maps.multi_pages); + ASSERT_OK_PTR(link, "attach_multi_pages"); + + bpf_link__destroy(link); + struct_ops_multi_pages__destroy(skel); +} + +void test_struct_ops_multi_pages(void) +{ + if (test__start_subtest("multi_pages")) + do_struct_ops_multi_pages(); +} diff --git a/tools/testing/selftests/bpf/progs/struct_ops_multi_pages.c b/tools/testing/selftests/bpf/progs/struct_ops_multi_pages.c new file mode 100644 index 000000000000..9efcc6e4d356 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/struct_ops_multi_pages.c @@ -0,0 +1,102 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ +#include <vmlinux.h> +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> +#include "../bpf_testmod/bpf_testmod.h" + +char _license[] SEC("license") = "GPL"; + +#define TRAMP(x) \ + SEC("struct_ops/tramp_" #x) \ + int BPF_PROG(tramp_ ## x, int a) \ + { \ + return a; \ + } + +TRAMP(1) +TRAMP(2) +TRAMP(3) +TRAMP(4) +TRAMP(5) +TRAMP(6) +TRAMP(7) +TRAMP(8) +TRAMP(9) +TRAMP(10) +TRAMP(11) +TRAMP(12) +TRAMP(13) +TRAMP(14) +TRAMP(15) +TRAMP(16) +TRAMP(17) +TRAMP(18) +TRAMP(19) +TRAMP(20) +TRAMP(21) +TRAMP(22) +TRAMP(23) +TRAMP(24) +TRAMP(25) +TRAMP(26) +TRAMP(27) +TRAMP(28) +TRAMP(29) +TRAMP(30) +TRAMP(31) +TRAMP(32) +TRAMP(33) +TRAMP(34) +TRAMP(35) +TRAMP(36) +TRAMP(37) +TRAMP(38) +TRAMP(39) +TRAMP(40) + +#define F_TRAMP(x) .tramp_ ## x = (void *)tramp_ ## x + +SEC(".struct_ops.link") +struct bpf_testmod_ops multi_pages = { + F_TRAMP(1), + F_TRAMP(2), + F_TRAMP(3), + F_TRAMP(4), + F_TRAMP(5), + F_TRAMP(6), + F_TRAMP(7), + F_TRAMP(8), + F_TRAMP(9), + F_TRAMP(10), + F_TRAMP(11), + F_TRAMP(12), + F_TRAMP(13), + F_TRAMP(14), + F_TRAMP(15), + F_TRAMP(16), + F_TRAMP(17), + F_TRAMP(18), + F_TRAMP(19), + F_TRAMP(20), + F_TRAMP(21), + F_TRAMP(22), + F_TRAMP(23), + F_TRAMP(24), + F_TRAMP(25), + F_TRAMP(26), + F_TRAMP(27), + F_TRAMP(28), + F_TRAMP(29), + F_TRAMP(30), + F_TRAMP(31), + F_TRAMP(32), + F_TRAMP(33), + F_TRAMP(34), + F_TRAMP(35), + F_TRAMP(36), + F_TRAMP(37), + F_TRAMP(38), + F_TRAMP(39), + F_TRAMP(40), +}; -- 2.34.1
From: Andrii Nakryiko <andrii@kernel.org> mainline inclusion from mainline-v6.9-rc1 commit 66c8473135c62f478301a0e5b3012f203562dfa6 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- prog->aux->sleepable is checked very frequently as part of (some) BPF program run hot paths. So this extra aux indirection seems wasteful and on busy systems might cause unnecessary memory cache misses. Let's move sleepable flag into prog itself to eliminate unnecessary pointer dereference. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Message-ID: <20240309004739.2961431-1-andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Conflicts: include/linux/bpf.h. kernel/bpf/syscall.c kernel/trace/bpf_trace.c [Fix conflicts because of context diff and CVE patches 12775e7694d47.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf.h | 8 ++++---- kernel/bpf/bpf_iter.c | 4 ++-- kernel/bpf/core.c | 2 +- kernel/bpf/syscall.c | 8 ++++---- kernel/bpf/trampoline.c | 4 ++-- kernel/bpf/verifier.c | 12 ++++++------ kernel/events/core.c | 2 +- kernel/trace/bpf_trace.c | 2 +- net/bpf/bpf_dummy_struct_ops.c | 2 +- 9 files changed, 22 insertions(+), 22 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 1c57b5ac2678..61955afeb201 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1528,7 +1528,6 @@ struct bpf_prog_aux { bool offload_requested; /* Program is bound and offloaded to the netdev. */ bool attach_btf_trace; /* true if attaching to BTF-enabled raw tp */ bool func_proto_unreliable; - bool sleepable; bool tail_call_reachable; bool xdp_has_frags; /* BTF_KIND_FUNC_PROTO for valid attach_btf_id */ @@ -1622,7 +1621,8 @@ struct bpf_prog { enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */ call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */ call_get_func_ip:1, /* Do we call get_func_ip() */ - tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */ + tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */ + sleepable:1; /* BPF program is sleepable */ enum bpf_prog_type type; /* Type of BPF program */ enum bpf_attach_type expected_attach_type; /* For some prog types */ u32 len; /* Number of filter blocks */ @@ -2227,14 +2227,14 @@ bpf_prog_run_array_uprobe(const struct bpf_prog_array *array, old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); item = &array->items[0]; while ((prog = READ_ONCE(item->prog))) { - if (!prog->aux->sleepable) + if (!prog->sleepable) rcu_read_lock(); run_ctx.bpf_cookie = item->bpf_cookie; ret &= run_prog(prog, ctx); item++; - if (!prog->aux->sleepable) + if (!prog->sleepable) rcu_read_unlock(); } bpf_reset_run_ctx(old_run_ctx); diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c index 0fae79164187..112581cf97e7 100644 --- a/kernel/bpf/bpf_iter.c +++ b/kernel/bpf/bpf_iter.c @@ -548,7 +548,7 @@ int bpf_iter_link_attach(const union bpf_attr *attr, bpfptr_t uattr, return -ENOENT; /* Only allow sleepable program for resched-able iterator */ - if (prog->aux->sleepable && !bpf_iter_target_support_resched(tinfo)) + if (prog->sleepable && !bpf_iter_target_support_resched(tinfo)) return -EINVAL; link = kzalloc(sizeof(*link), GFP_USER | __GFP_NOWARN); @@ -697,7 +697,7 @@ int bpf_iter_run_prog(struct bpf_prog *prog, void *ctx) struct bpf_run_ctx run_ctx, *old_run_ctx; int ret; - if (prog->aux->sleepable) { + if (prog->sleepable) { rcu_read_lock_trace(); migrate_disable(); might_fault(); diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 50f305a31415..55e2b667b823 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2757,7 +2757,7 @@ void __bpf_free_used_maps(struct bpf_prog_aux *aux, bool sleepable; u32 i; - sleepable = aux->sleepable; + sleepable = aux->prog->sleepable; for (i = 0; i < len; i++) { map = used_maps[i]; if (map->ops->map_poke_untrack) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 55beb18bc371..da12f1ca8399 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2165,7 +2165,7 @@ static void __bpf_prog_put_noref(struct bpf_prog *prog, bool deferred) btf_put(prog->aux->attach_btf); if (deferred) { - if (prog->aux->sleepable) + if (prog->sleepable) call_rcu_tasks_trace(&prog->aux->rcu, __bpf_prog_put_rcu); else call_rcu(&prog->aux->rcu, __bpf_prog_put_rcu); @@ -2712,11 +2712,11 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size) } prog->expected_attach_type = attr->expected_attach_type; + prog->sleepable = !!(attr->prog_flags & BPF_F_SLEEPABLE); prog->aux->attach_btf = attach_btf; prog->aux->attach_btf_id = attr->attach_btf_id; prog->aux->dst_prog = dst_prog; prog->aux->dev_bound = !!attr->prog_ifindex; - prog->aux->sleepable = attr->prog_flags & BPF_F_SLEEPABLE; prog->aux->xdp_has_frags = attr->prog_flags & BPF_F_XDP_HAS_FRAGS; err = security_bpf_prog_alloc(prog->aux); @@ -2937,7 +2937,7 @@ static void bpf_link_free(struct bpf_link *link) bpf_link_free_id(link->id); if (link->prog) { - sleepable = link->prog->aux->sleepable; + sleepable = link->prog->sleepable; /* detach BPF program, clean up used resources */ ops->release(link); } @@ -5477,7 +5477,7 @@ static int bpf_prog_bind_map(union bpf_attr *attr) /* The bpf program will not access the bpf map, but for the sake of * simplicity, increase sleepable_refcnt for sleepable program as well. */ - if (prog->aux->sleepable) + if (prog->sleepable) atomic64_inc(&map->sleepable_refcnt); memcpy(used_maps_new, used_maps_old, sizeof(used_maps_old[0]) * prog->aux->used_map_cnt); diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index ba4280057f49..afa5543cbcb1 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -1041,7 +1041,7 @@ void notrace __bpf_tramp_exit(struct bpf_tramp_image *tr) bpf_trampoline_enter_t bpf_trampoline_enter(const struct bpf_prog *prog) { - bool sleepable = prog->aux->sleepable; + bool sleepable = prog->sleepable; if (bpf_prog_check_recur(prog)) return sleepable ? __bpf_prog_enter_sleepable_recur : @@ -1056,7 +1056,7 @@ bpf_trampoline_enter_t bpf_trampoline_enter(const struct bpf_prog *prog) bpf_trampoline_exit_t bpf_trampoline_exit(const struct bpf_prog *prog) { - bool sleepable = prog->aux->sleepable; + bool sleepable = prog->sleepable; if (bpf_prog_check_recur(prog)) return sleepable ? __bpf_prog_exit_sleepable_recur : diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index e8ed32672d95..5ba537821d49 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5425,7 +5425,7 @@ static int map_kptr_match_type(struct bpf_verifier_env *env, static bool in_sleepable(struct bpf_verifier_env *env) { - return env->prog->aux->sleepable; + return env->prog->sleepable; } /* The non-sleepable programs and sleepable programs with explicit bpf_rcu_read_lock() @@ -17873,7 +17873,7 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env, return -EINVAL; } - if (prog->aux->sleepable) + if (prog->sleepable) switch (map->map_type) { case BPF_MAP_TYPE_HASH: case BPF_MAP_TYPE_LRU_HASH: @@ -18057,7 +18057,7 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env) return -E2BIG; } - if (env->prog->aux->sleepable) + if (env->prog->sleepable) atomic64_inc(&map->sleepable_refcnt); /* hold the map. If the program is rejected by verifier, * the map will be released by release_maps() or it @@ -20525,7 +20525,7 @@ int bpf_check_attach_target(struct bpf_verifier_log *log, } } - if (prog->aux->sleepable) { + if (prog->sleepable) { ret = -EINVAL; switch (prog->type) { case BPF_PROG_TYPE_TRACING: @@ -20663,14 +20663,14 @@ static int check_attach_btf_id(struct bpf_verifier_env *env) u64 key; if (prog->type == BPF_PROG_TYPE_SYSCALL) { - if (prog->aux->sleepable) + if (prog->sleepable) /* attach_btf_id checked to be zero already */ return 0; verbose(env, "Syscall programs can only be sleepable\n"); return -EINVAL; } - if (prog->aux->sleepable && !can_be_sleepable(prog)) { + if (prog->sleepable && !can_be_sleepable(prog)) { verbose(env, "Only fentry/fexit/fmod_ret, lsm, iter, uprobe, and struct_ops programs can be sleepable\n"); return -EINVAL; } diff --git a/kernel/events/core.c b/kernel/events/core.c index cdd34d6e3dd4..1c9c47207b1c 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -10672,7 +10672,7 @@ static int __perf_event_set_bpf_prog(struct perf_event *event, (is_syscall_tp && prog->type != BPF_PROG_TYPE_TRACEPOINT)) return -EINVAL; - if (prog->type == BPF_PROG_TYPE_KPROBE && prog->aux->sleepable && !is_uprobe) + if (prog->type == BPF_PROG_TYPE_KPROBE && prog->sleepable && !is_uprobe) /* only uprobe programs are allowed to be sleepable */ return -EINVAL; diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index fe21203dbc31..af5436c68339 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -3111,7 +3111,7 @@ static int uprobe_prog_run(struct bpf_uprobe *uprobe, .uprobe = uprobe, }; struct bpf_prog *prog = link->link.prog; - bool sleepable = prog->aux->sleepable; + bool sleepable = prog->sleepable; struct bpf_run_ctx *old_run_ctx; if (link->task && current->mm != link->task->mm) diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c index 1b5f812e6972..de33dc1b0daa 100644 --- a/net/bpf/bpf_dummy_struct_ops.c +++ b/net/bpf/bpf_dummy_struct_ops.c @@ -174,7 +174,7 @@ static int bpf_dummy_ops_check_member(const struct btf_type *t, case offsetof(struct bpf_dummy_ops, test_sleepable): break; default: - if (prog->aux->sleepable) + if (prog->sleepable) return -EINVAL; } -- 2.34.1
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA -------------------------------- Fix kabi-breakage by using KABI_FILL_HOLE and KABI_DEPRECATED. Fixes: 62b1bcfc7f25 ("bpf: move sleepable flag from bpf_prog_aux to bpf_prog") Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 61955afeb201..2a5a9aaaf6bd 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1528,6 +1528,7 @@ struct bpf_prog_aux { bool offload_requested; /* Program is bound and offloaded to the netdev. */ bool attach_btf_trace; /* true if attaching to BTF-enabled raw tp */ bool func_proto_unreliable; + KABI_DEPRECATE(bool, sleepable) bool tail_call_reachable; bool xdp_has_frags; /* BTF_KIND_FUNC_PROTO for valid attach_btf_id */ @@ -1621,13 +1622,13 @@ struct bpf_prog { enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */ call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */ call_get_func_ip:1, /* Do we call get_func_ip() */ - tstamp_type_access:1, /* Accessed __sk_buff->tstamp_type */ - sleepable:1; /* BPF program is sleepable */ + tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */ enum bpf_prog_type type; /* Type of BPF program */ enum bpf_attach_type expected_attach_type; /* For some prog types */ u32 len; /* Number of filter blocks */ u32 jited_len; /* Size of jited insns in bytes */ u8 tag[BPF_TAG_SIZE]; + KABI_FILL_HOLE(u8 sleepable:1) /* BPF program is sleepable */ struct bpf_prog_stats __percpu *stats; int __percpu *active; unsigned int (*bpf_func)(const void *ctx, -- 2.34.1
From: Martin KaFai Lau <martin.lau@kernel.org> mainline inclusion from mainline-v6.10-rc1 commit 7f3edd0c72c3f7214f8f28495f2e6466348eb128 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- There is a "if (err)" check earlier, so the "if (err < 0)" check that this patch removing is unnecessary. It was my overlook when making adjustments to the bpf_struct_ops_prepare_trampoline() such that the caller does not have to worry about the new page when the function returns error. Fixes: 187e2af05abe ("bpf: struct_ops supports more than one page for trampolines.") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20240315192112.2825039-1-martin.lau@linux.dev Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/bpf_struct_ops.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index cb6deeead404..d9607514b380 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -723,8 +723,6 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, cur_image = image; trampoline_start = 0; } - if (err < 0) - goto reset_unlock; *(void **)(kdata + moff) = image + trampoline_start + cfi_get_offset(); -- 2.34.1
From: Christophe Leroy <christophe.leroy@csgroup.eu> mainline inclusion from mainline-v6.10-rc1 commit e3362acd796789dc0562eb1a3937007b0beb0c5b category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Last user of arch_unprotect_bpf_trampoline() was removed by commit 187e2af05abe ("bpf: struct_ops supports more than one page for trampolines.") Remove arch_unprotect_bpf_trampoline() Reported-by: Daniel Borkmann <daniel@iogearbox.net> Fixes: 187e2af05abe ("bpf: struct_ops supports more than one page for trampolines.") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/r/42c635bb54d3af91db0f9b85d724c7c290069f67.171057435... Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Conflicts: arch/arm64/net/bpf_jit_comp.c [Fix conflicts because of 96b0f5addc7a ("arm64, bpf: Use bpf_prog_pack for arm64 bpf trampoline") is not merged.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- arch/x86/net/bpf_jit_comp.c | 4 ---- include/linux/bpf.h | 1 - kernel/bpf/trampoline.c | 7 ------- 3 files changed, 12 deletions(-) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 5c020dcec31b..132c9ec9d926 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -2813,10 +2813,6 @@ void arch_protect_bpf_trampoline(void *image, unsigned int size) { } -void arch_unprotect_bpf_trampoline(void *image, unsigned int size) -{ -} - int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end, const struct btf_func_model *m, u32 flags, struct bpf_tramp_links *tlinks, diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 2a5a9aaaf6bd..c67235157b35 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1170,7 +1170,6 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i void *arch_alloc_bpf_trampoline(unsigned int size); void arch_free_bpf_trampoline(void *image, unsigned int size); void arch_protect_bpf_trampoline(void *image, unsigned int size); -void arch_unprotect_bpf_trampoline(void *image, unsigned int size); int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags, struct bpf_tramp_links *tlinks, void *func_addr); diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index afa5543cbcb1..d62643031605 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -1105,13 +1105,6 @@ void __weak arch_protect_bpf_trampoline(void *image, unsigned int size) set_memory_rox((long)image, 1); } -void __weak arch_unprotect_bpf_trampoline(void *image, unsigned int size) -{ - WARN_ON_ONCE(size > PAGE_SIZE); - set_memory_nx((long)image, 1); - set_memory_rw((long)image, 1); -} - int __weak arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags, struct bpf_tramp_links *tlinks, void *func_addr) { -- 2.34.1
From: Christophe Leroy <christophe.leroy@csgroup.eu> mainline inclusion from mainline-v6.10-rc1 commit c733239f8f530872a1f80d8c45dcafbaff368737 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- arch_protect_bpf_trampoline() and alloc_new_pack() call set_memory_rox() which can fail, leading to unprotected memory. Take into account return from set_memory_rox() function and add __must_check flag to arch_protect_bpf_trampoline(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/fe1c163c83767fde5cab31d209a4a6be3ddb3a73.171057435... Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Conflicts: arch/arm64/net/bpf_jit_comp.c [Fix conflicts because of 96b0f5addc7a ("arm64, bpf: Use bpf_prog_pack for arm64 bpf trampoline") is not merged.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- arch/x86/net/bpf_jit_comp.c | 3 ++- include/linux/bpf.h | 2 +- kernel/bpf/bpf_struct_ops.c | 8 ++++++-- kernel/bpf/core.c | 28 +++++++++++++++++++++------- kernel/bpf/trampoline.c | 8 +++++--- net/bpf/bpf_dummy_struct_ops.c | 4 +++- 6 files changed, 38 insertions(+), 15 deletions(-) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 132c9ec9d926..a07c2224d839 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -2809,8 +2809,9 @@ void arch_free_bpf_trampoline(void *image, unsigned int size) bpf_prog_pack_free(image, size); } -void arch_protect_bpf_trampoline(void *image, unsigned int size) +int arch_protect_bpf_trampoline(void *image, unsigned int size) { + return 0; } int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *image_end, diff --git a/include/linux/bpf.h b/include/linux/bpf.h index c67235157b35..0f39e0e77d1e 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1169,7 +1169,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i void *func_addr); void *arch_alloc_bpf_trampoline(unsigned int size); void arch_free_bpf_trampoline(void *image, unsigned int size); -void arch_protect_bpf_trampoline(void *image, unsigned int size); +int __must_check arch_protect_bpf_trampoline(void *image, unsigned int size); int arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags, struct bpf_tramp_links *tlinks, void *func_addr); diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index d9607514b380..1409a844183c 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -735,8 +735,12 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key, if (err) goto reset_unlock; } - for (i = 0; i < st_map->image_pages_cnt; i++) - arch_protect_bpf_trampoline(st_map->image_pages[i], PAGE_SIZE); + for (i = 0; i < st_map->image_pages_cnt; i++) { + err = arch_protect_bpf_trampoline(st_map->image_pages[i], + PAGE_SIZE); + if (err) + goto reset_unlock; + } if (st_map->map.map_flags & BPF_F_LINK) { err = 0; diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 55e2b667b823..13649240750a 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -908,23 +908,30 @@ static LIST_HEAD(pack_list); static struct bpf_prog_pack *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_insns) { struct bpf_prog_pack *pack; + int err; pack = kzalloc(struct_size(pack, bitmap, BITS_TO_LONGS(BPF_PROG_CHUNK_COUNT)), GFP_KERNEL); if (!pack) return NULL; pack->ptr = bpf_jit_alloc_exec(BPF_PROG_PACK_SIZE); - if (!pack->ptr) { - kfree(pack); - return NULL; - } + if (!pack->ptr) + goto out; bpf_fill_ill_insns(pack->ptr, BPF_PROG_PACK_SIZE); bitmap_zero(pack->bitmap, BPF_PROG_PACK_SIZE / BPF_PROG_CHUNK_SIZE); - list_add_tail(&pack->list, &pack_list); set_vm_flush_reset_perms(pack->ptr); - set_memory_rox((unsigned long)pack->ptr, BPF_PROG_PACK_SIZE / PAGE_SIZE); + err = set_memory_rox((unsigned long)pack->ptr, + BPF_PROG_PACK_SIZE / PAGE_SIZE); + if (err) + goto out; + list_add_tail(&pack->list, &pack_list); return pack; + +out: + bpf_jit_free_exec(pack->ptr); + kfree(pack); + return NULL; } void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns) @@ -939,9 +946,16 @@ void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns) size = round_up(size, PAGE_SIZE); ptr = bpf_jit_alloc_exec(size); if (ptr) { + int err; + bpf_fill_ill_insns(ptr, size); set_vm_flush_reset_perms(ptr); - set_memory_rox((unsigned long)ptr, size / PAGE_SIZE); + err = set_memory_rox((unsigned long)ptr, + size / PAGE_SIZE); + if (err) { + bpf_jit_free_exec(ptr); + ptr = NULL; + } } goto out; } diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index d62643031605..f28e060f0989 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -456,7 +456,9 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr, bool lock_direct_mut if (err < 0) goto out_free; - arch_protect_bpf_trampoline(im->image, im->size); + err = arch_protect_bpf_trampoline(im->image, im->size); + if (err) + goto out_free; WARN_ON(tr->cur_image && total == 0); if (tr->cur_image) @@ -1099,10 +1101,10 @@ void __weak arch_free_bpf_trampoline(void *image, unsigned int size) bpf_jit_free_exec(image); } -void __weak arch_protect_bpf_trampoline(void *image, unsigned int size) +int __weak arch_protect_bpf_trampoline(void *image, unsigned int size) { WARN_ON_ONCE(size > PAGE_SIZE); - set_memory_rox((long)image, 1); + return set_memory_rox((long)image, 1); } int __weak arch_bpf_trampoline_size(const struct btf_func_model *m, u32 flags, diff --git a/net/bpf/bpf_dummy_struct_ops.c b/net/bpf/bpf_dummy_struct_ops.c index de33dc1b0daa..25b75844891a 100644 --- a/net/bpf/bpf_dummy_struct_ops.c +++ b/net/bpf/bpf_dummy_struct_ops.c @@ -133,7 +133,9 @@ int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, if (err < 0) goto out; - arch_protect_bpf_trampoline(image, PAGE_SIZE); + err = arch_protect_bpf_trampoline(image, PAGE_SIZE); + if (err) + goto out; prog_ret = dummy_ops_call_op(image, args); err = dummy_ops_copy_args(args); -- 2.34.1
From: Alexander Lobakin <aleksander.lobakin@intel.com> mainline inclusion from mainline-v6.10-rc1 commit 7d8296b250f2eed73f1758607926d4d258dea5d4 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Avoid open-coding that simple expression each time by moving BYTES_TO_BITS() from the probes code to <linux/bitops.h> to export it to the rest of the kernel. Simplify the macro while at it. `BITS_PER_LONG / sizeof(long)` always equals to %BITS_PER_BYTE, regardless of the target architecture. Do the same for the tools ecosystem as well (incl. its version of bitops.h). The previous implementation had its implicit type of long, while the new one is int, so adjust the format literal accordingly in the perf code. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bitops.h | 2 ++ kernel/trace/trace_probe.c | 2 -- tools/include/linux/bitops.h | 2 ++ tools/perf/util/probe-finder.c | 4 +--- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/include/linux/bitops.h b/include/linux/bitops.h index b2342eebc8d2..07729a31198e 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -20,6 +20,8 @@ #define BITS_TO_U32(nr) __KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(u32)) #define BITS_TO_BYTES(nr) __KERNEL_DIV_ROUND_UP(nr, BITS_PER_TYPE(char)) +#define BYTES_TO_BITS(nb) ((nb) * BITS_PER_BYTE) + extern unsigned int __sw_hweight8(unsigned int w); extern unsigned int __sw_hweight16(unsigned int w); extern unsigned int __sw_hweight32(unsigned int w); diff --git a/kernel/trace/trace_probe.c b/kernel/trace/trace_probe.c index 1500c5c060b1..b168d6dd0773 100644 --- a/kernel/trace/trace_probe.c +++ b/kernel/trace/trace_probe.c @@ -1067,8 +1067,6 @@ parse_probe_arg(char *arg, const struct fetch_type *type, return ret; } -#define BYTES_TO_BITS(nb) ((BITS_PER_LONG * (nb)) / sizeof(long)) - /* Bitfield type needs to be parsed into a fetch function */ static int __parse_bitfield_probe_arg(const char *bf, const struct fetch_type *t, diff --git a/tools/include/linux/bitops.h b/tools/include/linux/bitops.h index 7319f6ced108..272f15d0e434 100644 --- a/tools/include/linux/bitops.h +++ b/tools/include/linux/bitops.h @@ -20,6 +20,8 @@ #define BITS_TO_U32(nr) DIV_ROUND_UP(nr, BITS_PER_TYPE(u32)) #define BITS_TO_BYTES(nr) DIV_ROUND_UP(nr, BITS_PER_TYPE(char)) +#define BYTES_TO_BITS(nb) ((nb) * BITS_PER_BYTE) + extern unsigned int __sw_hweight8(unsigned int w); extern unsigned int __sw_hweight16(unsigned int w); extern unsigned int __sw_hweight32(unsigned int w); diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c index c0c8d7f9514b..f9c396fe81dc 100644 --- a/tools/perf/util/probe-finder.c +++ b/tools/perf/util/probe-finder.c @@ -304,8 +304,6 @@ static int convert_variable_location(Dwarf_Die *vr_die, Dwarf_Addr addr, return ret2; } -#define BYTES_TO_BITS(nb) ((nb) * BITS_PER_LONG / sizeof(long)) - static int convert_variable_type(Dwarf_Die *vr_die, struct probe_trace_arg *tvar, const char *cast, bool user_access) @@ -335,7 +333,7 @@ static int convert_variable_type(Dwarf_Die *vr_die, total = dwarf_bytesize(vr_die); if (boffs < 0 || total < 0) return -ENOENT; - ret = snprintf(buf, 16, "b%d@%d/%zd", bsize, boffs, + ret = snprintf(buf, 16, "b%d@%d/%d", bsize, boffs, BYTES_TO_BITS(total)); goto formatted; } -- 2.34.1
From: Alan Maguire <alan.maguire@oracle.com> mainline inclusion from mainline-v6.10-rc1 commit fcd1ed89a0439c45e1336bd9649485c44b7597c7 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- The btf_features list can be used for pahole v1.26 and later - it is useful because if a feature is not yet implemented it will not exit with a failure message. This will allow us to add feature requests to the pahole options without having to check pahole versions in future; if the version of pahole supports the feature it will be added. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240507135514.490467-1-alan.maguire@oracle.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- scripts/Makefile.btf | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf index 82377e470aed..2d6e5ed9081e 100644 --- a/scripts/Makefile.btf +++ b/scripts/Makefile.btf @@ -3,6 +3,8 @@ pahole-ver := $(CONFIG_PAHOLE_VERSION) pahole-flags-y := +ifeq ($(call test-le, $(pahole-ver), 125),y) + # pahole 1.18 through 1.21 can't handle zero-sized per-CPU vars ifeq ($(call test-le, $(pahole-ver), 121),y) pahole-flags-$(call test-ge, $(pahole-ver), 118) += --skip_encoding_btf_vars @@ -12,8 +14,17 @@ pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats pahole-flags-$(call test-ge, $(pahole-ver), 122) += -j -pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE) += --lang_exclude=rust +ifeq ($(pahole-ver), 125) +pahole-flags-y += --skip_encoding_btf_inconsistent_proto --btf_gen_optimized +endif + +else -pahole-flags-$(call test-ge, $(pahole-ver), 125) += --skip_encoding_btf_inconsistent_proto --btf_gen_optimized +# Switch to using --btf_features for v1.26 and later. +pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func + +endif + +pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE) += --lang_exclude=rust export PAHOLE_FLAGS := $(pahole-flags-y) -- 2.34.1
From: Alan Maguire <alan.maguire@oracle.com> mainline inclusion from mainline-v6.11-rc1 commit 34021caef79f76e70ac31247d321ecd0683c4939 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- There is no need to set the pahole v1.25-only flags in an "ifeq" version clause; we are already in a <= v1.25 branch of "ifeq", so that combined with a "test-ge" v1.25 ensures the flags will be applied for v1.25 only. Suggested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240514162716.2448265-1-alan.maguire@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- scripts/Makefile.btf | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf index 2d6e5ed9081e..bca8a8f26ea4 100644 --- a/scripts/Makefile.btf +++ b/scripts/Makefile.btf @@ -14,9 +14,7 @@ pahole-flags-$(call test-ge, $(pahole-ver), 121) += --btf_gen_floats pahole-flags-$(call test-ge, $(pahole-ver), 122) += -j -ifeq ($(pahole-ver), 125) -pahole-flags-y += --skip_encoding_btf_inconsistent_proto --btf_gen_optimized -endif +pahole-flags-$(call test-ge, $(pahole-ver), 125) += --skip_encoding_btf_inconsistent_proto --btf_gen_optimized else -- 2.34.1
From: Yafang Shao <laoar.shao@gmail.com> mainline inclusion from mainline-v6.11-rc1 commit 4665415975b0827e9646cab91c61d02a6b364d59 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Add three new kfuncs for the bits iterator: - bpf_iter_bits_new Initialize a new bits iterator for a given memory area. Due to the limitation of bpf memalloc, the max number of words (8-byte units) that can be iterated over is limited to (4096 / 8). - bpf_iter_bits_next Get the next bit in a bpf_iter_bits - bpf_iter_bits_destroy Destroy a bpf_iter_bits The bits iterator facilitates the iteration of the bits of a memory area, such as cpumask. It can be used in any context and on any address. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240517023034.48138-2-laoar.shao@gmail.com Conflicts: kernel/bpf/helpers.c [Fix conflicts because of context diff.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/helpers.c | 119 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 119 insertions(+) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 7b79779126a0..364498c643de 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2625,6 +2625,122 @@ __bpf_kfunc void bpf_rcu_read_unlock(void) rcu_read_unlock(); } +struct bpf_iter_bits { + __u64 __opaque[2]; +} __aligned(8); + +struct bpf_iter_bits_kern { + union { + unsigned long *bits; + unsigned long bits_copy; + }; + u32 nr_bits; + int bit; +} __aligned(8); + +/** + * bpf_iter_bits_new() - Initialize a new bits iterator for a given memory area + * @it: The new bpf_iter_bits to be created + * @unsafe_ptr__ign: A pointer pointing to a memory area to be iterated over + * @nr_words: The size of the specified memory area, measured in 8-byte units. + * Due to the limitation of memalloc, it can't be greater than 512. + * + * This function initializes a new bpf_iter_bits structure for iterating over + * a memory area which is specified by the @unsafe_ptr__ign and @nr_words. It + * copies the data of the memory area to the newly created bpf_iter_bits @it for + * subsequent iteration operations. + * + * On success, 0 is returned. On failure, ERR is returned. + */ +__bpf_kfunc int +bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_words) +{ + struct bpf_iter_bits_kern *kit = (void *)it; + u32 nr_bytes = nr_words * sizeof(u64); + u32 nr_bits = BYTES_TO_BITS(nr_bytes); + int err; + + BUILD_BUG_ON(sizeof(struct bpf_iter_bits_kern) != sizeof(struct bpf_iter_bits)); + BUILD_BUG_ON(__alignof__(struct bpf_iter_bits_kern) != + __alignof__(struct bpf_iter_bits)); + + kit->nr_bits = 0; + kit->bits_copy = 0; + kit->bit = -1; + + if (!unsafe_ptr__ign || !nr_words) + return -EINVAL; + + /* Optimization for u64 mask */ + if (nr_bits == 64) { + err = bpf_probe_read_kernel_common(&kit->bits_copy, nr_bytes, unsafe_ptr__ign); + if (err) + return -EFAULT; + + kit->nr_bits = nr_bits; + return 0; + } + + /* Fallback to memalloc */ + kit->bits = bpf_mem_alloc(&bpf_global_ma, nr_bytes); + if (!kit->bits) + return -ENOMEM; + + err = bpf_probe_read_kernel_common(kit->bits, nr_bytes, unsafe_ptr__ign); + if (err) { + bpf_mem_free(&bpf_global_ma, kit->bits); + return err; + } + + kit->nr_bits = nr_bits; + return 0; +} + +/** + * bpf_iter_bits_next() - Get the next bit in a bpf_iter_bits + * @it: The bpf_iter_bits to be checked + * + * This function returns a pointer to a number representing the value of the + * next bit in the bits. + * + * If there are no further bits available, it returns NULL. + */ +__bpf_kfunc int *bpf_iter_bits_next(struct bpf_iter_bits *it) +{ + struct bpf_iter_bits_kern *kit = (void *)it; + u32 nr_bits = kit->nr_bits; + const unsigned long *bits; + int bit; + + if (nr_bits == 0) + return NULL; + + bits = nr_bits == 64 ? &kit->bits_copy : kit->bits; + bit = find_next_bit(bits, nr_bits, kit->bit + 1); + if (bit >= nr_bits) { + kit->nr_bits = 0; + return NULL; + } + + kit->bit = bit; + return &kit->bit; +} + +/** + * bpf_iter_bits_destroy() - Destroy a bpf_iter_bits + * @it: The bpf_iter_bits to be destroyed + * + * Destroy the resource associated with the bpf_iter_bits. + */ +__bpf_kfunc void bpf_iter_bits_destroy(struct bpf_iter_bits *it) +{ + struct bpf_iter_bits_kern *kit = (void *)it; + + if (kit->nr_bits <= 64) + return; + bpf_mem_free(&bpf_global_ma, kit->bits); +} + __bpf_kfunc_end_defs(); BTF_KFUNCS_START(generic_btf_ids) @@ -2700,6 +2816,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_is_null) BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly) BTF_ID_FLAGS(func, bpf_dynptr_size) BTF_ID_FLAGS(func, bpf_dynptr_clone) +BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW) +BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY) BTF_KFUNCS_END(common_btf_ids) static const struct btf_kfunc_id_set common_kfunc_set = { -- 2.34.1
From: Yafang Shao <laoar.shao@gmail.com> mainline inclusion from mainline-v6.11-rc1 commit 6ba7acdb93b4ecb554d5838fca3f5f0fcf9fff14 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Add test cases for the bits iter: - Positive cases - Bit mask representing a single word (8-byte unit) - Bit mask representing data spanning more than one word - The index of the set bit - Nagative cases - bpf_iter_bits_destroy() is required after calling bpf_iter_bits_new() - bpf_iter_bits_destroy() can only destroy an initialized iter - bpf_iter_bits_next() must use an initialized iter - Bit mask representing zero words - Bit mask representing fewer words than expected - Case for ENOMEM - Case for NULL pointer Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240517023034.48138-3-laoar.shao@gmail.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/prog_tests/verifier.c | 2 + .../selftests/bpf/progs/verifier_bits_iter.c | 153 ++++++++++++++++++ 2 files changed, 155 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c diff --git a/tools/testing/selftests/bpf/prog_tests/verifier.c b/tools/testing/selftests/bpf/prog_tests/verifier.c index 8d746642cbd7..6de0aeebbb4a 100644 --- a/tools/testing/selftests/bpf/prog_tests/verifier.c +++ b/tools/testing/selftests/bpf/prog_tests/verifier.c @@ -79,6 +79,7 @@ #include "verifier_xadd.skel.h" #include "verifier_xdp.skel.h" #include "verifier_xdp_direct_packet_access.skel.h" +#include "verifier_bits_iter.skel.h" #define MAX_ENTRIES 11 @@ -188,6 +189,7 @@ void test_verifier_var_off(void) { RUN(verifier_var_off); } void test_verifier_xadd(void) { RUN(verifier_xadd); } void test_verifier_xdp(void) { RUN(verifier_xdp); } void test_verifier_xdp_direct_packet_access(void) { RUN(verifier_xdp_direct_packet_access); } +void test_verifier_bits_iter(void) { RUN(verifier_bits_iter); } static int init_test_val_map(struct bpf_object *obj, char *map_name) { diff --git a/tools/testing/selftests/bpf/progs/verifier_bits_iter.c b/tools/testing/selftests/bpf/progs/verifier_bits_iter.c new file mode 100644 index 000000000000..716113c2bce2 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/verifier_bits_iter.c @@ -0,0 +1,153 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2024 Yafang Shao <laoar.shao@gmail.com> */ + +#include "vmlinux.h" +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> + +#include "bpf_misc.h" +#include "task_kfunc_common.h" + +char _license[] SEC("license") = "GPL"; + +int bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, + u32 nr_bits) __ksym __weak; +int *bpf_iter_bits_next(struct bpf_iter_bits *it) __ksym __weak; +void bpf_iter_bits_destroy(struct bpf_iter_bits *it) __ksym __weak; + +SEC("iter.s/cgroup") +__description("bits iter without destroy") +__failure __msg("Unreleased reference") +int BPF_PROG(no_destroy, struct bpf_iter_meta *meta, struct cgroup *cgrp) +{ + struct bpf_iter_bits it; + u64 data = 1; + + bpf_iter_bits_new(&it, &data, 1); + bpf_iter_bits_next(&it); + return 0; +} + +SEC("iter/cgroup") +__description("uninitialized iter in ->next()") +__failure __msg("expected an initialized iter_bits as arg #1") +int BPF_PROG(next_uninit, struct bpf_iter_meta *meta, struct cgroup *cgrp) +{ + struct bpf_iter_bits *it = NULL; + + bpf_iter_bits_next(it); + return 0; +} + +SEC("iter/cgroup") +__description("uninitialized iter in ->destroy()") +__failure __msg("expected an initialized iter_bits as arg #1") +int BPF_PROG(destroy_uninit, struct bpf_iter_meta *meta, struct cgroup *cgrp) +{ + struct bpf_iter_bits it = {}; + + bpf_iter_bits_destroy(&it); + return 0; +} + +SEC("syscall") +__description("null pointer") +__success __retval(0) +int null_pointer(void) +{ + int nr = 0; + int *bit; + + bpf_for_each(bits, bit, NULL, 1) + nr++; + return nr; +} + +SEC("syscall") +__description("bits copy") +__success __retval(10) +int bits_copy(void) +{ + u64 data = 0xf7310UL; /* 4 + 3 + 2 + 1 + 0*/ + int nr = 0; + int *bit; + + bpf_for_each(bits, bit, &data, 1) + nr++; + return nr; +} + +SEC("syscall") +__description("bits memalloc") +__success __retval(64) +int bits_memalloc(void) +{ + u64 data[2]; + int nr = 0; + int *bit; + + __builtin_memset(&data, 0xf0, sizeof(data)); /* 4 * 16 */ + bpf_for_each(bits, bit, &data[0], sizeof(data) / sizeof(u64)) + nr++; + return nr; +} + +SEC("syscall") +__description("bit index") +__success __retval(8) +int bit_index(void) +{ + u64 data = 0x100; + int bit_idx = 0; + int *bit; + + bpf_for_each(bits, bit, &data, 1) { + if (*bit == 0) + continue; + bit_idx = *bit; + } + return bit_idx; +} + +SEC("syscall") +__description("bits nomem") +__success __retval(0) +int bits_nomem(void) +{ + u64 data[4]; + int nr = 0; + int *bit; + + __builtin_memset(&data, 0xff, sizeof(data)); + bpf_for_each(bits, bit, &data[0], 513) /* Be greater than 512 */ + nr++; + return nr; +} + +SEC("syscall") +__description("fewer words") +__success __retval(1) +int fewer_words(void) +{ + u64 data[2] = {0x1, 0xff}; + int nr = 0; + int *bit; + + bpf_for_each(bits, bit, &data[0], 1) + nr++; + return nr; +} + +SEC("syscall") +__description("zero words") +__success __retval(0) +int zero_words(void) +{ + u64 data[2] = {0x1, 0xff}; + int nr = 0; + int *bit; + + bpf_for_each(bits, bit, &data[0], 0) + nr++; + return nr; +} -- 2.34.1
hulk inclusion category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA -------------------------------- The original commit is came from 80a4129fcf20 ("selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages"). This commit fix test_loader that didn't support running bpf_prog_type_syscall programs. Fixes: 54b0c6cdfd3b ("selftests/bpf: Add selftest for bits iter") Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/testing/selftests/bpf/test_loader.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c index b4edd8454934..5cbd0ab1a4c9 100644 --- a/tools/testing/selftests/bpf/test_loader.c +++ b/tools/testing/selftests/bpf/test_loader.c @@ -480,7 +480,7 @@ static bool is_unpriv_capable_map(struct bpf_map *map) } } -static int do_prog_test_run(int fd_prog, int *retval) +static int do_prog_test_run(int fd_prog, int *retval, bool empty_opts) { __u8 tmp_out[TEST_DATA_LEN << 2] = {}; __u8 tmp_in[TEST_DATA_LEN] = {}; @@ -493,6 +493,11 @@ static int do_prog_test_run(int fd_prog, int *retval) .repeat = 1, ); + if (empty_opts) { + memset(&topts, 0, sizeof(struct bpf_test_run_opts)); + topts.sz = sizeof(struct bpf_test_run_opts); + } + err = bpf_prog_test_run_opts(fd_prog, &topts); saved_errno = errno; @@ -625,7 +630,8 @@ void run_subtest(struct test_loader *tester, } } - do_prog_test_run(bpf_program__fd(tprog), &retval); + do_prog_test_run(bpf_program__fd(tprog), &retval, + bpf_program__type(tprog) == BPF_PROG_TYPE_SYSCALL ? true : false); if (retval != subspec->retval && subspec->retval != POINTER_VALUE) { PRINT_FAIL("Unexpected retval: %d != %d\n", retval, subspec->retval); goto tobj_cleanup; -- 2.34.1
From: Andrii Nakryiko <andrii@kernel.org> mainline inclusion from mainline-v6.11-rc1 commit 68153bb2fffbe59804370e514482f95c4b2053ff category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Implement iterator-based type ID and string offset BTF field iterator. This is used extensively in BTF-handling code and BPF linker code for various sanity checks, rewriting IDs/offsets, etc. Currently this is implemented as visitor pattern calling custom callbacks, which makes the logic (especially in simple cases) unnecessarily obscure and harder to follow. Having equivalent functionality using iterator pattern makes for simpler to understand and maintain code. As we add more code for BTF processing logic in libbpf, it's best to switch to iterator pattern before adding more callback-based code. The idea for iterator-based implementation is to record offsets of necessary fields within fixed btf_type parts (which should be iterated just once), and, for kinds that have multiple members (based on vlen field), record where in each member necessary fields are located. Generic iteration code then just keeps track of last offset that was returned and handles N members correctly. Return type is just u32 pointer, where NULL is returned when all relevant fields were already iterated. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240605001629.4061937-2-andrii@kernel.org Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/btf.c | 162 ++++++++++++++++++++++++++++++++ tools/lib/bpf/libbpf_internal.h | 24 +++++ 2 files changed, 186 insertions(+) diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index 2fe479fa0c77..daf28445b9f0 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -4965,6 +4965,168 @@ int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit, void *ct return 0; } +int btf_field_iter_init(struct btf_field_iter *it, struct btf_type *t, enum btf_field_iter_kind iter_kind) +{ + it->p = NULL; + it->m_idx = -1; + it->off_idx = 0; + it->vlen = 0; + + switch (iter_kind) { + case BTF_FIELD_ITER_IDS: + switch (btf_kind(t)) { + case BTF_KIND_UNKN: + case BTF_KIND_INT: + case BTF_KIND_FLOAT: + case BTF_KIND_ENUM: + case BTF_KIND_ENUM64: + it->desc = (struct btf_field_desc) {}; + break; + case BTF_KIND_FWD: + case BTF_KIND_CONST: + case BTF_KIND_VOLATILE: + case BTF_KIND_RESTRICT: + case BTF_KIND_PTR: + case BTF_KIND_TYPEDEF: + case BTF_KIND_FUNC: + case BTF_KIND_VAR: + case BTF_KIND_DECL_TAG: + case BTF_KIND_TYPE_TAG: + it->desc = (struct btf_field_desc) { 1, {offsetof(struct btf_type, type)} }; + break; + case BTF_KIND_ARRAY: + it->desc = (struct btf_field_desc) { + 2, {sizeof(struct btf_type) + offsetof(struct btf_array, type), + sizeof(struct btf_type) + offsetof(struct btf_array, index_type)} + }; + break; + case BTF_KIND_STRUCT: + case BTF_KIND_UNION: + it->desc = (struct btf_field_desc) { + 0, {}, + sizeof(struct btf_member), + 1, {offsetof(struct btf_member, type)} + }; + break; + case BTF_KIND_FUNC_PROTO: + it->desc = (struct btf_field_desc) { + 1, {offsetof(struct btf_type, type)}, + sizeof(struct btf_param), + 1, {offsetof(struct btf_param, type)} + }; + break; + case BTF_KIND_DATASEC: + it->desc = (struct btf_field_desc) { + 0, {}, + sizeof(struct btf_var_secinfo), + 1, {offsetof(struct btf_var_secinfo, type)} + }; + break; + default: + return -EINVAL; + } + break; + case BTF_FIELD_ITER_STRS: + switch (btf_kind(t)) { + case BTF_KIND_UNKN: + it->desc = (struct btf_field_desc) {}; + break; + case BTF_KIND_INT: + case BTF_KIND_FLOAT: + case BTF_KIND_FWD: + case BTF_KIND_ARRAY: + case BTF_KIND_CONST: + case BTF_KIND_VOLATILE: + case BTF_KIND_RESTRICT: + case BTF_KIND_PTR: + case BTF_KIND_TYPEDEF: + case BTF_KIND_FUNC: + case BTF_KIND_VAR: + case BTF_KIND_DECL_TAG: + case BTF_KIND_TYPE_TAG: + case BTF_KIND_DATASEC: + it->desc = (struct btf_field_desc) { + 1, {offsetof(struct btf_type, name_off)} + }; + break; + case BTF_KIND_ENUM: + it->desc = (struct btf_field_desc) { + 1, {offsetof(struct btf_type, name_off)}, + sizeof(struct btf_enum), + 1, {offsetof(struct btf_enum, name_off)} + }; + break; + case BTF_KIND_ENUM64: + it->desc = (struct btf_field_desc) { + 1, {offsetof(struct btf_type, name_off)}, + sizeof(struct btf_enum64), + 1, {offsetof(struct btf_enum64, name_off)} + }; + break; + case BTF_KIND_STRUCT: + case BTF_KIND_UNION: + it->desc = (struct btf_field_desc) { + 1, {offsetof(struct btf_type, name_off)}, + sizeof(struct btf_member), + 1, {offsetof(struct btf_member, name_off)} + }; + break; + case BTF_KIND_FUNC_PROTO: + it->desc = (struct btf_field_desc) { + 1, {offsetof(struct btf_type, name_off)}, + sizeof(struct btf_param), + 1, {offsetof(struct btf_param, name_off)} + }; + break; + default: + return -EINVAL; + } + break; + default: + return -EINVAL; + } + + if (it->desc.m_sz) + it->vlen = btf_vlen(t); + + it->p = t; + return 0; +} + +__u32 *btf_field_iter_next(struct btf_field_iter *it) +{ + if (!it->p) + return NULL; + + if (it->m_idx < 0) { + if (it->off_idx < it->desc.t_cnt) + return it->p + it->desc.t_offs[it->off_idx++]; + /* move to per-member iteration */ + it->m_idx = 0; + it->p += sizeof(struct btf_type); + it->off_idx = 0; + } + + /* if type doesn't have members, stop */ + if (it->desc.m_sz == 0) { + it->p = NULL; + return NULL; + } + + if (it->off_idx >= it->desc.m_cnt) { + /* exhausted this member's fields, go to the next member */ + it->m_idx++; + it->p += it->desc.m_sz; + it->off_idx = 0; + } + + if (it->m_idx < it->vlen) + return it->p + it->desc.m_offs[it->off_idx++]; + + it->p = NULL; + return NULL; +} + int btf_ext_visit_type_ids(struct btf_ext *btf_ext, type_id_visit_fn visit, void *ctx) { const struct btf_ext_info *seg; diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h index 57dec645d687..193a04a1e0d2 100644 --- a/tools/lib/bpf/libbpf_internal.h +++ b/tools/lib/bpf/libbpf_internal.h @@ -486,6 +486,30 @@ struct bpf_line_info_min { __u32 line_col; }; +enum btf_field_iter_kind { + BTF_FIELD_ITER_IDS, + BTF_FIELD_ITER_STRS, +}; + +struct btf_field_desc { + /* once-per-type offsets */ + int t_cnt, t_offs[2]; + /* member struct size, or zero, if no members */ + int m_sz; + /* repeated per-member offsets */ + int m_cnt, m_offs[1]; +}; + +struct btf_field_iter { + struct btf_field_desc desc; + void *p; + int m_idx; + int off_idx; + int vlen; +}; + +int btf_field_iter_init(struct btf_field_iter *it, struct btf_type *t, enum btf_field_iter_kind iter_kind); +__u32 *btf_field_iter_next(struct btf_field_iter *it); typedef int (*type_id_visit_fn)(__u32 *type_id, void *ctx); typedef int (*str_off_visit_fn)(__u32 *str_off, void *ctx); -- 2.34.1
From: Andrii Nakryiko <andrii@kernel.org> mainline inclusion from mainline-v6.11-rc1 commit 2bce2c1cb2f0acbf619737a10575f99df0c43984 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Switch all BPF linker code dealing with iterating BTF type ID and string offset fields to new btf_field_iter facilities. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240605001629.4061937-3-andrii@kernel.org Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/btf.c | 4 +-- tools/lib/bpf/libbpf_internal.h | 4 +-- tools/lib/bpf/linker.c | 58 ++++++++++++++++++++------------- 3 files changed, 40 insertions(+), 26 deletions(-) diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index daf28445b9f0..64e680331e83 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -5099,7 +5099,7 @@ __u32 *btf_field_iter_next(struct btf_field_iter *it) return NULL; if (it->m_idx < 0) { - if (it->off_idx < it->desc.t_cnt) + if (it->off_idx < it->desc.t_off_cnt) return it->p + it->desc.t_offs[it->off_idx++]; /* move to per-member iteration */ it->m_idx = 0; @@ -5113,7 +5113,7 @@ __u32 *btf_field_iter_next(struct btf_field_iter *it) return NULL; } - if (it->off_idx >= it->desc.m_cnt) { + if (it->off_idx >= it->desc.m_off_cnt) { /* exhausted this member's fields, go to the next member */ it->m_idx++; it->p += it->desc.m_sz; diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h index 193a04a1e0d2..32a961e7d389 100644 --- a/tools/lib/bpf/libbpf_internal.h +++ b/tools/lib/bpf/libbpf_internal.h @@ -493,11 +493,11 @@ enum btf_field_iter_kind { struct btf_field_desc { /* once-per-type offsets */ - int t_cnt, t_offs[2]; + int t_off_cnt, t_offs[2]; /* member struct size, or zero, if no members */ int m_sz; /* repeated per-member offsets */ - int m_cnt, m_offs[1]; + int m_off_cnt, m_offs[1]; }; struct btf_field_iter { diff --git a/tools/lib/bpf/linker.c b/tools/lib/bpf/linker.c index e1b3136643aa..7a2c13db6f9d 100644 --- a/tools/lib/bpf/linker.c +++ b/tools/lib/bpf/linker.c @@ -934,19 +934,33 @@ static int check_btf_str_off(__u32 *str_off, void *ctx) static int linker_sanity_check_btf(struct src_obj *obj) { struct btf_type *t; - int i, n, err = 0; + int i, n, err; if (!obj->btf) return 0; n = btf__type_cnt(obj->btf); for (i = 1; i < n; i++) { + struct btf_field_iter it; + __u32 *type_id, *str_off; + t = btf_type_by_id(obj->btf, i); - err = err ?: btf_type_visit_type_ids(t, check_btf_type_id, obj->btf); - err = err ?: btf_type_visit_str_offs(t, check_btf_str_off, obj->btf); + err = btf_field_iter_init(&it, t, BTF_FIELD_ITER_IDS); if (err) return err; + while ((type_id = btf_field_iter_next(&it))) { + if (*type_id >= n) + return -EINVAL; + } + + err = btf_field_iter_init(&it, t, BTF_FIELD_ITER_STRS); + if (err) + return err; + while ((str_off = btf_field_iter_next(&it))) { + if (!btf__str_by_offset(obj->btf, *str_off)) + return -EINVAL; + } } return 0; @@ -2218,26 +2232,10 @@ static int linker_fixup_btf(struct src_obj *obj) return 0; } -static int remap_type_id(__u32 *type_id, void *ctx) -{ - int *id_map = ctx; - int new_id = id_map[*type_id]; - - /* Error out if the type wasn't remapped. Ignore VOID which stays VOID. */ - if (new_id == 0 && *type_id != 0) { - pr_warn("failed to find new ID mapping for original BTF type ID %u\n", *type_id); - return -EINVAL; - } - - *type_id = id_map[*type_id]; - - return 0; -} - static int linker_append_btf(struct bpf_linker *linker, struct src_obj *obj) { const struct btf_type *t; - int i, j, n, start_id, id; + int i, j, n, start_id, id, err; const char *name; if (!obj->btf) @@ -2308,9 +2306,25 @@ static int linker_append_btf(struct bpf_linker *linker, struct src_obj *obj) n = btf__type_cnt(linker->btf); for (i = start_id; i < n; i++) { struct btf_type *dst_t = btf_type_by_id(linker->btf, i); + struct btf_field_iter it; + __u32 *type_id; - if (btf_type_visit_type_ids(dst_t, remap_type_id, obj->btf_type_map)) - return -EINVAL; + err = btf_field_iter_init(&it, dst_t, BTF_FIELD_ITER_IDS); + if (err) + return err; + + while ((type_id = btf_field_iter_next(&it))) { + int new_id = obj->btf_type_map[*type_id]; + + /* Error out if the type wasn't remapped. Ignore VOID which stays VOID. */ + if (new_id == 0 && *type_id != 0) { + pr_warn("failed to find new ID mapping for original BTF type ID %u\n", + *type_id); + return -EINVAL; + } + + *type_id = obj->btf_type_map[*type_id]; + } } /* Rewrite VAR/FUNC underlying types (i.e., FUNC's FUNC_PROTO and VAR's -- 2.34.1
From: Andrii Nakryiko <andrii@kernel.org> mainline inclusion from mainline-v6.11-rc1 commit c2641123696b572a3b059e1b45777317ba9f9086 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Use new BTF field iterator logic to replace all the callback-based visitor calls. There is still a .BTF.ext callback-based visitor APIs that should be converted, which will happens as a follow up. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240605001629.4061937-4-andrii@kernel.org Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/btf.c | 76 ++++++++++++++++++++++++++++++++------------- 1 file changed, 54 insertions(+), 22 deletions(-) diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index 64e680331e83..08b0c8efb29e 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -1573,9 +1573,8 @@ struct btf_pipe { struct hashmap *str_off_map; /* map string offsets from src to dst */ }; -static int btf_rewrite_str(__u32 *str_off, void *ctx) +static int btf_rewrite_str(struct btf_pipe *p, __u32 *str_off) { - struct btf_pipe *p = ctx; long mapped_off; int off, err; @@ -1608,7 +1607,9 @@ static int btf_rewrite_str(__u32 *str_off, void *ctx) int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_type *src_type) { struct btf_pipe p = { .src = src_btf, .dst = btf }; + struct btf_field_iter it; struct btf_type *t; + __u32 *str_off; int sz, err; sz = btf_type_size(src_type); @@ -1625,26 +1626,17 @@ int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_t memcpy(t, src_type, sz); - err = btf_type_visit_str_offs(t, btf_rewrite_str, &p); + err = btf_field_iter_init(&it, t, BTF_FIELD_ITER_STRS); if (err) return libbpf_err(err); - return btf_commit_type(btf, sz); -} - -static int btf_rewrite_type_ids(__u32 *type_id, void *ctx) -{ - struct btf *btf = ctx; - - if (!*type_id) /* nothing to do for VOID references */ - return 0; + while ((str_off = btf_field_iter_next(&it))) { + err = btf_rewrite_str(&p, str_off); + if (err) + return libbpf_err(err); + } - /* we haven't updated btf's type count yet, so - * btf->start_id + btf->nr_types - 1 is the type ID offset we should - * add to all newly added BTF types - */ - *type_id += btf->start_id + btf->nr_types - 1; - return 0; + return btf_commit_type(btf, sz); } static size_t btf_dedup_identity_hash_fn(long key, void *ctx); @@ -1692,6 +1684,9 @@ int btf__add_btf(struct btf *btf, const struct btf *src_btf) memcpy(t, src_btf->types_data, data_sz); for (i = 0; i < cnt; i++) { + struct btf_field_iter it; + __u32 *type_id, *str_off; + sz = btf_type_size(t); if (sz < 0) { /* unlikely, has to be corrupted src_btf */ @@ -1703,15 +1698,31 @@ int btf__add_btf(struct btf *btf, const struct btf *src_btf) *off = t - btf->types_data; /* add, dedup, and remap strings referenced by this BTF type */ - err = btf_type_visit_str_offs(t, btf_rewrite_str, &p); + err = btf_field_iter_init(&it, t, BTF_FIELD_ITER_STRS); if (err) goto err_out; + while ((str_off = btf_field_iter_next(&it))) { + err = btf_rewrite_str(&p, str_off); + if (err) + goto err_out; + } /* remap all type IDs referenced from this BTF type */ - err = btf_type_visit_type_ids(t, btf_rewrite_type_ids, btf); + err = btf_field_iter_init(&it, t, BTF_FIELD_ITER_IDS); if (err) goto err_out; + while ((type_id = btf_field_iter_next(&it))) { + if (!*type_id) /* nothing to do for VOID references */ + continue; + + /* we haven't updated btf's type count yet, so + * btf->start_id + btf->nr_types - 1 is the type ID offset we should + * add to all newly added BTF types + */ + *type_id += btf->start_id + btf->nr_types - 1; + } + /* go to next type data and type offset index entry */ t += sz; off++; @@ -3283,11 +3294,19 @@ static int btf_for_each_str_off(struct btf_dedup *d, str_off_visit_fn fn, void * int i, r; for (i = 0; i < d->btf->nr_types; i++) { + struct btf_field_iter it; struct btf_type *t = btf_type_by_id(d->btf, d->btf->start_id + i); + __u32 *str_off; - r = btf_type_visit_str_offs(t, fn, ctx); + r = btf_field_iter_init(&it, t, BTF_FIELD_ITER_STRS); if (r) return r; + + while ((str_off = btf_field_iter_next(&it))) { + r = fn(str_off, ctx); + if (r) + return r; + } } if (!d->btf_ext) @@ -4765,10 +4784,23 @@ static int btf_dedup_remap_types(struct btf_dedup *d) for (i = 0; i < d->btf->nr_types; i++) { struct btf_type *t = btf_type_by_id(d->btf, d->btf->start_id + i); + struct btf_field_iter it; + __u32 *type_id; - r = btf_type_visit_type_ids(t, btf_dedup_remap_type_id, d); + r = btf_field_iter_init(&it, t, BTF_FIELD_ITER_IDS); if (r) return r; + + while ((type_id = btf_field_iter_next(&it))) { + __u32 resolved_id, new_id; + + resolved_id = resolve_type_id(d, *type_id); + new_id = d->hypot_map[resolved_id]; + if (new_id > BTF_MAX_NR_TYPES) + return -EINVAL; + + *type_id = new_id; + } } if (!d->btf_ext) -- 2.34.1
From: Andrii Nakryiko <andrii@kernel.org> mainline inclusion from mainline-v6.11-rc1 commit e1a8630291fde2a0edac2955e3df48587dac9906 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Switch bpftool's code which is using libbpf-internal btf_type_visit_type_ids() helper to new btf_field_iter functionality. This makes bpftool code simpler, but also unblocks removing libbpf's btf_type_visit_type_ids() helper completely. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Quentin Monnet <qmo@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240605001629.4061937-5-andrii@kernel.org Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/bpf/bpftool/gen.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/tools/bpf/bpftool/gen.c b/tools/bpf/bpftool/gen.c index 34515fbe9414..c2005feda012 100644 --- a/tools/bpf/bpftool/gen.c +++ b/tools/bpf/bpftool/gen.c @@ -2359,15 +2359,6 @@ static int btfgen_record_obj(struct btfgen_info *info, const char *obj_path) return err; } -static int btfgen_remap_id(__u32 *type_id, void *ctx) -{ - unsigned int *ids = ctx; - - *type_id = ids[*type_id]; - - return 0; -} - /* Generate BTF from relocation information previously recorded */ static struct btf *btfgen_get_btf(struct btfgen_info *info) { @@ -2447,10 +2438,15 @@ static struct btf *btfgen_get_btf(struct btfgen_info *info) /* second pass: fix up type ids */ for (i = 1; i < btf__type_cnt(btf_new); i++) { struct btf_type *btf_type = (struct btf_type *) btf__type_by_id(btf_new, i); + struct btf_field_iter it; + __u32 *type_id; - err = btf_type_visit_type_ids(btf_type, btfgen_remap_id, ids); + err = btf_field_iter_init(&it, btf_type, BTF_FIELD_ITER_IDS); if (err) goto err_out; + + while ((type_id = btf_field_iter_next(&it))) + *type_id = ids[*type_id]; } free(ids); -- 2.34.1
From: Andrii Nakryiko <andrii@kernel.org> mainline inclusion from mainline-v6.11-rc1 commit 072088704433f75dacf9e33179dd7a81f0a238d4 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Now that all libbpf/bpftool code switched to btf_field_iter, remove btf_type_visit_type_ids() and btf_type_visit_str_offs() callback-based helpers as not needed anymore. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240605001629.4061937-6-andrii@kernel.org Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/btf.c | 130 -------------------------------- tools/lib/bpf/libbpf_internal.h | 2 - 2 files changed, 132 deletions(-) diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index 08b0c8efb29e..d97991fead93 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -4867,136 +4867,6 @@ struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_bt return btf__parse_split(path, vmlinux_btf); } -int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit, void *ctx) -{ - int i, n, err; - - switch (btf_kind(t)) { - case BTF_KIND_INT: - case BTF_KIND_FLOAT: - case BTF_KIND_ENUM: - case BTF_KIND_ENUM64: - return 0; - - case BTF_KIND_FWD: - case BTF_KIND_CONST: - case BTF_KIND_VOLATILE: - case BTF_KIND_RESTRICT: - case BTF_KIND_PTR: - case BTF_KIND_TYPEDEF: - case BTF_KIND_FUNC: - case BTF_KIND_VAR: - case BTF_KIND_DECL_TAG: - case BTF_KIND_TYPE_TAG: - return visit(&t->type, ctx); - - case BTF_KIND_ARRAY: { - struct btf_array *a = btf_array(t); - - err = visit(&a->type, ctx); - err = err ?: visit(&a->index_type, ctx); - return err; - } - - case BTF_KIND_STRUCT: - case BTF_KIND_UNION: { - struct btf_member *m = btf_members(t); - - for (i = 0, n = btf_vlen(t); i < n; i++, m++) { - err = visit(&m->type, ctx); - if (err) - return err; - } - return 0; - } - - case BTF_KIND_FUNC_PROTO: { - struct btf_param *m = btf_params(t); - - err = visit(&t->type, ctx); - if (err) - return err; - for (i = 0, n = btf_vlen(t); i < n; i++, m++) { - err = visit(&m->type, ctx); - if (err) - return err; - } - return 0; - } - - case BTF_KIND_DATASEC: { - struct btf_var_secinfo *m = btf_var_secinfos(t); - - for (i = 0, n = btf_vlen(t); i < n; i++, m++) { - err = visit(&m->type, ctx); - if (err) - return err; - } - return 0; - } - - default: - return -EINVAL; - } -} - -int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit, void *ctx) -{ - int i, n, err; - - err = visit(&t->name_off, ctx); - if (err) - return err; - - switch (btf_kind(t)) { - case BTF_KIND_STRUCT: - case BTF_KIND_UNION: { - struct btf_member *m = btf_members(t); - - for (i = 0, n = btf_vlen(t); i < n; i++, m++) { - err = visit(&m->name_off, ctx); - if (err) - return err; - } - break; - } - case BTF_KIND_ENUM: { - struct btf_enum *m = btf_enum(t); - - for (i = 0, n = btf_vlen(t); i < n; i++, m++) { - err = visit(&m->name_off, ctx); - if (err) - return err; - } - break; - } - case BTF_KIND_ENUM64: { - struct btf_enum64 *m = btf_enum64(t); - - for (i = 0, n = btf_vlen(t); i < n; i++, m++) { - err = visit(&m->name_off, ctx); - if (err) - return err; - } - break; - } - case BTF_KIND_FUNC_PROTO: { - struct btf_param *m = btf_params(t); - - for (i = 0, n = btf_vlen(t); i < n; i++, m++) { - err = visit(&m->name_off, ctx); - if (err) - return err; - } - break; - } - default: - break; - } - - return 0; -} - int btf_field_iter_init(struct btf_field_iter *it, struct btf_type *t, enum btf_field_iter_kind iter_kind) { it->p = NULL; diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h index 32a961e7d389..2f594b4b732f 100644 --- a/tools/lib/bpf/libbpf_internal.h +++ b/tools/lib/bpf/libbpf_internal.h @@ -513,8 +513,6 @@ __u32 *btf_field_iter_next(struct btf_field_iter *it); typedef int (*type_id_visit_fn)(__u32 *type_id, void *ctx); typedef int (*str_off_visit_fn)(__u32 *str_off, void *ctx); -int btf_type_visit_type_ids(struct btf_type *t, type_id_visit_fn visit, void *ctx); -int btf_type_visit_str_offs(struct btf_type *t, str_off_visit_fn visit, void *ctx); int btf_ext_visit_type_ids(struct btf_ext *btf_ext, type_id_visit_fn visit, void *ctx); int btf_ext_visit_str_offs(struct btf_ext *btf_ext, str_off_visit_fn visit, void *ctx); __s32 btf__find_by_name_kind_own(const struct btf *btf, const char *type_name, -- 2.34.1
From: Alan Maguire <alan.maguire@oracle.com> mainline inclusion from mainline-v6.11-rc1 commit 58e185a0dc359a6c1c9eff348d7badfc9f722159 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- To support more robust split BTF, adding supplemental context for the base BTF type ids that split BTF refers to is required. Without such references, a simple shuffling of base BTF type ids (without any other significant change) invalidates the split BTF. Here the attempt is made to store additional context to make split BTF more robust. This context comes in the form of distilled base BTF providing minimal information (name and - in some cases - size) for base INTs, FLOATs, STRUCTs, UNIONs, ENUMs and ENUM64s along with modified split BTF that points at that base and contains any additional types needed (such as TYPEDEF, PTR and anonymous STRUCT/UNION declarations). This information constitutes the minimal BTF representation needed to disambiguate or remove split BTF references to base BTF. The rules are as follows: - INT, FLOAT, FWD are recorded in full. - if a named base BTF STRUCT or UNION is referred to from split BTF, it will be encoded as a zero-member sized STRUCT/UNION (preserving size for later relocation checks). Only base BTF STRUCT/UNIONs that are either embedded in split BTF STRUCT/UNIONs or that have multiple STRUCT/UNION instances of the same name will _need_ size checks at relocation time, but as it is possible a different set of types will be duplicates in the later to-be-resolved base BTF, we preserve size information for all named STRUCT/UNIONs. - if an ENUM[64] is named, a ENUM forward representation (an ENUM with no values) of the same size is used. - in all other cases, the type is added to the new split BTF. Avoiding struct/union/enum/enum64 expansion is important to keep the distilled base BTF representation to a minimum size. When successful, new representations of the distilled base BTF and new split BTF that refers to it are returned. Both need to be freed by the caller. So to take a simple example, with split BTF with a type referring to "struct sk_buff", we will generate distilled base BTF with a 0-member STRUCT sk_buff of the appropriate size, and the split BTF will refer to it instead. Tools like pahole can utilize such split BTF to populate the .BTF section (split BTF) and an additional .BTF.base section. Then when the split BTF is loaded, the distilled base BTF can be used to relocate split BTF to reference the current (and possibly changed) base BTF. So for example if "struct sk_buff" was id 502 when the split BTF was originally generated, we can use the distilled base BTF to see that id 502 refers to a "struct sk_buff" and replace instances of id 502 with the current (relocated) base BTF sk_buff type id. Distilled base BTF is small; when building a kernel with all modules using distilled base BTF as a test, overall module size grew by only 5.3Mb total across ~2700 modules. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240613095014.357981-2-alan.maguire@oracle.com Conflicts: tools/lib/bpf/libbpf.map [The conflicts were due to btf__distill_base is LIBBPF_1.5.0 API, but current version of libbpf is LIBBPF_1.3.0] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/btf.c | 319 ++++++++++++++++++++++++++++++++++++++- tools/lib/bpf/btf.h | 21 +++ tools/lib/bpf/libbpf.map | 1 + 3 files changed, 335 insertions(+), 6 deletions(-) diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index d97991fead93..5ef5544578ab 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -1604,9 +1604,8 @@ static int btf_rewrite_str(struct btf_pipe *p, __u32 *str_off) return 0; } -int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_type *src_type) +static int btf_add_type(struct btf_pipe *p, const struct btf_type *src_type) { - struct btf_pipe p = { .src = src_btf, .dst = btf }; struct btf_field_iter it; struct btf_type *t; __u32 *str_off; @@ -1617,10 +1616,10 @@ int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_t return libbpf_err(sz); /* deconstruct BTF, if necessary, and invalidate raw_data */ - if (btf_ensure_modifiable(btf)) + if (btf_ensure_modifiable(p->dst)) return libbpf_err(-ENOMEM); - t = btf_add_type_mem(btf, sz); + t = btf_add_type_mem(p->dst, sz); if (!t) return libbpf_err(-ENOMEM); @@ -1631,12 +1630,19 @@ int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_t return libbpf_err(err); while ((str_off = btf_field_iter_next(&it))) { - err = btf_rewrite_str(&p, str_off); + err = btf_rewrite_str(p, str_off); if (err) return libbpf_err(err); } - return btf_commit_type(btf, sz); + return btf_commit_type(p->dst, sz); +} + +int btf__add_type(struct btf *btf, const struct btf *src_btf, const struct btf_type *src_type) +{ + struct btf_pipe p = { .src = src_btf, .dst = btf }; + + return btf_add_type(&p, src_type); } static size_t btf_dedup_identity_hash_fn(long key, void *ctx); @@ -5108,3 +5114,304 @@ int btf_ext_visit_str_offs(struct btf_ext *btf_ext, str_off_visit_fn visit, void return 0; } + +struct btf_distill { + struct btf_pipe pipe; + int *id_map; + unsigned int split_start_id; + unsigned int split_start_str; + int diff_id; +}; + +static int btf_add_distilled_type_ids(struct btf_distill *dist, __u32 i) +{ + struct btf_type *split_t = btf_type_by_id(dist->pipe.src, i); + struct btf_field_iter it; + __u32 *id; + int err; + + err = btf_field_iter_init(&it, split_t, BTF_FIELD_ITER_IDS); + if (err) + return err; + while ((id = btf_field_iter_next(&it))) { + struct btf_type *base_t; + + if (!*id) + continue; + /* split BTF id, not needed */ + if (*id >= dist->split_start_id) + continue; + /* already added ? */ + if (dist->id_map[*id] > 0) + continue; + + /* only a subset of base BTF types should be referenced from + * split BTF; ensure nothing unexpected is referenced. + */ + base_t = btf_type_by_id(dist->pipe.src, *id); + switch (btf_kind(base_t)) { + case BTF_KIND_INT: + case BTF_KIND_FLOAT: + case BTF_KIND_FWD: + case BTF_KIND_ARRAY: + case BTF_KIND_STRUCT: + case BTF_KIND_UNION: + case BTF_KIND_TYPEDEF: + case BTF_KIND_ENUM: + case BTF_KIND_ENUM64: + case BTF_KIND_PTR: + case BTF_KIND_CONST: + case BTF_KIND_RESTRICT: + case BTF_KIND_VOLATILE: + case BTF_KIND_FUNC_PROTO: + case BTF_KIND_TYPE_TAG: + dist->id_map[*id] = *id; + break; + default: + pr_warn("unexpected reference to base type[%u] of kind [%u] when creating distilled base BTF.\n", + *id, btf_kind(base_t)); + return -EINVAL; + } + /* If a base type is used, ensure types it refers to are + * marked as used also; so for example if we find a PTR to INT + * we need both the PTR and INT. + * + * The only exception is named struct/unions, since distilled + * base BTF composite types have no members. + */ + if (btf_is_composite(base_t) && base_t->name_off) + continue; + err = btf_add_distilled_type_ids(dist, *id); + if (err) + return err; + } + return 0; +} + +static int btf_add_distilled_types(struct btf_distill *dist) +{ + bool adding_to_base = dist->pipe.dst->start_id == 1; + int id = btf__type_cnt(dist->pipe.dst); + struct btf_type *t; + int i, err = 0; + + + /* Add types for each of the required references to either distilled + * base or split BTF, depending on type characteristics. + */ + for (i = 1; i < dist->split_start_id; i++) { + const char *name; + int kind; + + if (!dist->id_map[i]) + continue; + t = btf_type_by_id(dist->pipe.src, i); + kind = btf_kind(t); + name = btf__name_by_offset(dist->pipe.src, t->name_off); + + switch (kind) { + case BTF_KIND_INT: + case BTF_KIND_FLOAT: + case BTF_KIND_FWD: + /* Named int, float, fwd are added to base. */ + if (!adding_to_base) + continue; + err = btf_add_type(&dist->pipe, t); + break; + case BTF_KIND_STRUCT: + case BTF_KIND_UNION: + /* Named struct/union are added to base as 0-vlen + * struct/union of same size. Anonymous struct/unions + * are added to split BTF as-is. + */ + if (adding_to_base) { + if (!t->name_off) + continue; + err = btf_add_composite(dist->pipe.dst, kind, name, t->size); + } else { + if (t->name_off) + continue; + err = btf_add_type(&dist->pipe, t); + } + break; + case BTF_KIND_ENUM: + case BTF_KIND_ENUM64: + /* Named enum[64]s are added to base as a sized + * enum; relocation will match with appropriately-named + * and sized enum or enum64. + * + * Anonymous enums are added to split BTF as-is. + */ + if (adding_to_base) { + if (!t->name_off) + continue; + err = btf__add_enum(dist->pipe.dst, name, t->size); + } else { + if (t->name_off) + continue; + err = btf_add_type(&dist->pipe, t); + } + break; + case BTF_KIND_ARRAY: + case BTF_KIND_TYPEDEF: + case BTF_KIND_PTR: + case BTF_KIND_CONST: + case BTF_KIND_RESTRICT: + case BTF_KIND_VOLATILE: + case BTF_KIND_FUNC_PROTO: + case BTF_KIND_TYPE_TAG: + /* All other types are added to split BTF. */ + if (adding_to_base) + continue; + err = btf_add_type(&dist->pipe, t); + break; + default: + pr_warn("unexpected kind when adding base type '%s'[%u] of kind [%u] to distilled base BTF.\n", + name, i, kind); + return -EINVAL; + + } + if (err < 0) + break; + dist->id_map[i] = id++; + } + return err; +} + +/* Split BTF ids without a mapping will be shifted downwards since distilled + * base BTF is smaller than the original base BTF. For those that have a + * mapping (either to base or updated split BTF), update the id based on + * that mapping. + */ +static int btf_update_distilled_type_ids(struct btf_distill *dist, __u32 i) +{ + struct btf_type *t = btf_type_by_id(dist->pipe.dst, i); + struct btf_field_iter it; + __u32 *id; + int err; + + err = btf_field_iter_init(&it, t, BTF_FIELD_ITER_IDS); + if (err) + return err; + while ((id = btf_field_iter_next(&it))) { + if (dist->id_map[*id]) + *id = dist->id_map[*id]; + else if (*id >= dist->split_start_id) + *id -= dist->diff_id; + } + return 0; +} + +/* Create updated split BTF with distilled base BTF; distilled base BTF + * consists of BTF information required to clarify the types that split + * BTF refers to, omitting unneeded details. Specifically it will contain + * base types and memberless definitions of named structs, unions and enumerated + * types. Associated reference types like pointers, arrays and anonymous + * structs, unions and enumerated types will be added to split BTF. + * Size is recorded for named struct/unions to help guide matching to the + * target base BTF during later relocation. + * + * The only case where structs, unions or enumerated types are fully represented + * is when they are anonymous; in such cases, the anonymous type is added to + * split BTF in full. + * + * We return newly-created split BTF where the split BTF refers to a newly-created + * distilled base BTF. Both must be freed separately by the caller. + */ +int btf__distill_base(const struct btf *src_btf, struct btf **new_base_btf, + struct btf **new_split_btf) +{ + struct btf *new_base = NULL, *new_split = NULL; + const struct btf *old_base; + unsigned int n = btf__type_cnt(src_btf); + struct btf_distill dist = {}; + struct btf_type *t; + int i, err = 0; + + /* src BTF must be split BTF. */ + old_base = btf__base_btf(src_btf); + if (!new_base_btf || !new_split_btf || !old_base) + return libbpf_err(-EINVAL); + + new_base = btf__new_empty(); + if (!new_base) + return libbpf_err(-ENOMEM); + dist.id_map = calloc(n, sizeof(*dist.id_map)); + if (!dist.id_map) { + err = -ENOMEM; + goto done; + } + dist.pipe.src = src_btf; + dist.pipe.dst = new_base; + dist.pipe.str_off_map = hashmap__new(btf_dedup_identity_hash_fn, btf_dedup_equal_fn, NULL); + if (IS_ERR(dist.pipe.str_off_map)) { + err = -ENOMEM; + goto done; + } + dist.split_start_id = btf__type_cnt(old_base); + dist.split_start_str = old_base->hdr->str_len; + + /* Pass over src split BTF; generate the list of base BTF type ids it + * references; these will constitute our distilled BTF set to be + * distributed over base and split BTF as appropriate. + */ + for (i = src_btf->start_id; i < n; i++) { + err = btf_add_distilled_type_ids(&dist, i); + if (err < 0) + goto done; + } + /* Next add types for each of the required references to base BTF and split BTF + * in turn. + */ + err = btf_add_distilled_types(&dist); + if (err < 0) + goto done; + + /* Create new split BTF with distilled base BTF as its base; the final + * state is split BTF with distilled base BTF that represents enough + * about its base references to allow it to be relocated with the base + * BTF available. + */ + new_split = btf__new_empty_split(new_base); + if (!new_split_btf) { + err = -errno; + goto done; + } + dist.pipe.dst = new_split; + /* First add all split types */ + for (i = src_btf->start_id; i < n; i++) { + t = btf_type_by_id(src_btf, i); + err = btf_add_type(&dist.pipe, t); + if (err < 0) + goto done; + } + /* Now add distilled types to split BTF that are not added to base. */ + err = btf_add_distilled_types(&dist); + if (err < 0) + goto done; + + /* All split BTF ids will be shifted downwards since there are less base + * BTF ids in distilled base BTF. + */ + dist.diff_id = dist.split_start_id - btf__type_cnt(new_base); + + n = btf__type_cnt(new_split); + /* Now update base/split BTF ids. */ + for (i = 1; i < n; i++) { + err = btf_update_distilled_type_ids(&dist, i); + if (err < 0) + break; + } +done: + free(dist.id_map); + hashmap__free(dist.pipe.str_off_map); + if (err) { + btf__free(new_split); + btf__free(new_base); + return libbpf_err(err); + } + *new_base_btf = new_base; + *new_split_btf = new_split; + + return 0; +} diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h index 8e6880d91c84..cb08ee9a5a10 100644 --- a/tools/lib/bpf/btf.h +++ b/tools/lib/bpf/btf.h @@ -107,6 +107,27 @@ LIBBPF_API struct btf *btf__new_empty(void); */ LIBBPF_API struct btf *btf__new_empty_split(struct btf *base_btf); +/** + * @brief **btf__distill_base()** creates new versions of the split BTF + * *src_btf* and its base BTF. The new base BTF will only contain the types + * needed to improve robustness of the split BTF to small changes in base BTF. + * When that split BTF is loaded against a (possibly changed) base, this + * distilled base BTF will help update references to that (possibly changed) + * base BTF. + * + * Both the new split and its associated new base BTF must be freed by + * the caller. + * + * If successful, 0 is returned and **new_base_btf** and **new_split_btf** + * will point at new base/split BTF. Both the new split and its associated + * new base BTF must be freed by the caller. + * + * A negative value is returned on error and the thread-local `errno` variable + * is set to the error code as well. + */ +LIBBPF_API int btf__distill_base(const struct btf *src_btf, struct btf **new_base_btf, + struct btf **new_split_btf); + LIBBPF_API struct btf *btf__parse(const char *path, struct btf_ext **btf_ext); LIBBPF_API struct btf *btf__parse_split(const char *path, struct btf *base_btf); LIBBPF_API struct btf *btf__parse_elf(const char *path, struct btf_ext **btf_ext); diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map index f802a77d10b6..acf4e94bebc9 100644 --- a/tools/lib/bpf/libbpf.map +++ b/tools/lib/bpf/libbpf.map @@ -395,6 +395,7 @@ LIBBPF_1.2.0 { LIBBPF_1.3.0 { global: btf__new_split; + btf__distill_base; bpf_obj_pin_opts; bpf_object__unpin; bpf_prog_detach_opts; -- 2.34.1
From: Alan Maguire <alan.maguire@oracle.com> mainline inclusion from mainline-v6.11-rc1 commit eb20e727c4343ad591cff2bef243590c77f62cf1 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Test generation of split+distilled base BTF, ensuring that - named base BTF STRUCTs and UNIONs are represented as 0-vlen sized STRUCT/UNIONs - named ENUM[64]s are represented as 0-vlen named ENUM[64]s - anonymous struct/unions are represented in full in split BTF - anonymous enums are represented in full in split BTF - types unreferenced from split BTF are not present in distilled base BTF Also test that with vmlinux BTF and split BTF based upon it, we only represent needed base types referenced from split BTF in distilled base. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240613095014.357981-3-alan.maguire@oracle.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/prog_tests/btf_distill.c | 274 ++++++++++++++++++ 1 file changed, 274 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_distill.c diff --git a/tools/testing/selftests/bpf/prog_tests/btf_distill.c b/tools/testing/selftests/bpf/prog_tests/btf_distill.c new file mode 100644 index 000000000000..5c3a38747962 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/btf_distill.c @@ -0,0 +1,274 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024, Oracle and/or its affiliates. */ + +#include <test_progs.h> +#include <bpf/btf.h> +#include "btf_helpers.h" + +/* Fabricate base, split BTF with references to base types needed; then create + * split BTF with distilled base BTF and ensure expectations are met: + * - only referenced base types from split BTF are present + * - struct/union/enum are represented as empty unless anonymous, when they + * are represented in full in split BTF + */ +static void test_distilled_base(void) +{ + struct btf *btf1 = NULL, *btf2 = NULL, *btf3 = NULL, *btf4 = NULL; + + btf1 = btf__new_empty(); + if (!ASSERT_OK_PTR(btf1, "empty_main_btf")) + return; + + btf__add_int(btf1, "int", 4, BTF_INT_SIGNED); /* [1] int */ + btf__add_ptr(btf1, 1); /* [2] ptr to int */ + btf__add_struct(btf1, "s1", 8); /* [3] struct s1 { */ + btf__add_field(btf1, "f1", 2, 0, 0); /* int *f1; */ + /* } */ + btf__add_struct(btf1, "", 12); /* [4] struct { */ + btf__add_field(btf1, "f1", 1, 0, 0); /* int f1; */ + btf__add_field(btf1, "f2", 3, 32, 0); /* struct s1 f2; */ + /* } */ + btf__add_int(btf1, "unsigned int", 4, 0); /* [5] unsigned int */ + btf__add_union(btf1, "u1", 12); /* [6] union u1 { */ + btf__add_field(btf1, "f1", 1, 0, 0); /* int f1; */ + btf__add_field(btf1, "f2", 2, 0, 0); /* int *f2; */ + /* } */ + btf__add_union(btf1, "", 4); /* [7] union { */ + btf__add_field(btf1, "f1", 1, 0, 0); /* int f1; */ + /* } */ + btf__add_enum(btf1, "e1", 4); /* [8] enum e1 { */ + btf__add_enum_value(btf1, "v1", 1); /* v1 = 1; */ + /* } */ + btf__add_enum(btf1, "", 4); /* [9] enum { */ + btf__add_enum_value(btf1, "av1", 2); /* av1 = 2; */ + /* } */ + btf__add_enum64(btf1, "e641", 8, true); /* [10] enum64 { */ + btf__add_enum64_value(btf1, "v1", 1024); /* v1 = 1024; */ + /* } */ + btf__add_enum64(btf1, "", 8, true); /* [11] enum64 { */ + btf__add_enum64_value(btf1, "v1", 1025); /* v1 = 1025; */ + /* } */ + btf__add_struct(btf1, "unneeded", 4); /* [12] struct unneeded { */ + btf__add_field(btf1, "f1", 1, 0, 0); /* int f1; */ + /* } */ + btf__add_struct(btf1, "embedded", 4); /* [13] struct embedded { */ + btf__add_field(btf1, "f1", 1, 0, 0); /* int f1; */ + /* } */ + btf__add_func_proto(btf1, 1); /* [14] int (*)(int *p1); */ + btf__add_func_param(btf1, "p1", 1); + + btf__add_array(btf1, 1, 1, 3); /* [15] int [3]; */ + + btf__add_struct(btf1, "from_proto", 4); /* [16] struct from_proto { */ + btf__add_field(btf1, "f1", 1, 0, 0); /* int f1; */ + /* } */ + btf__add_union(btf1, "u1", 4); /* [17] union u1 { */ + btf__add_field(btf1, "f1", 1, 0, 0); /* int f1; */ + /* } */ + VALIDATE_RAW_BTF( + btf1, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] PTR '(anon)' type_id=1", + "[3] STRUCT 's1' size=8 vlen=1\n" + "\t'f1' type_id=2 bits_offset=0", + "[4] STRUCT '(anon)' size=12 vlen=2\n" + "\t'f1' type_id=1 bits_offset=0\n" + "\t'f2' type_id=3 bits_offset=32", + "[5] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)", + "[6] UNION 'u1' size=12 vlen=2\n" + "\t'f1' type_id=1 bits_offset=0\n" + "\t'f2' type_id=2 bits_offset=0", + "[7] UNION '(anon)' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0", + "[8] ENUM 'e1' encoding=UNSIGNED size=4 vlen=1\n" + "\t'v1' val=1", + "[9] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n" + "\t'av1' val=2", + "[10] ENUM64 'e641' encoding=SIGNED size=8 vlen=1\n" + "\t'v1' val=1024", + "[11] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n" + "\t'v1' val=1025", + "[12] STRUCT 'unneeded' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0", + "[13] STRUCT 'embedded' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0", + "[14] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n" + "\t'p1' type_id=1", + "[15] ARRAY '(anon)' type_id=1 index_type_id=1 nr_elems=3", + "[16] STRUCT 'from_proto' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0", + "[17] UNION 'u1' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0"); + + btf2 = btf__new_empty_split(btf1); + if (!ASSERT_OK_PTR(btf2, "empty_split_btf")) + goto cleanup; + + btf__add_ptr(btf2, 3); /* [18] ptr to struct s1 */ + /* add ptr to struct anon */ + btf__add_ptr(btf2, 4); /* [19] ptr to struct (anon) */ + btf__add_const(btf2, 6); /* [20] const union u1 */ + btf__add_restrict(btf2, 7); /* [21] restrict union (anon) */ + btf__add_volatile(btf2, 8); /* [22] volatile enum e1 */ + btf__add_typedef(btf2, "et", 9); /* [23] typedef enum (anon) */ + btf__add_const(btf2, 10); /* [24] const enum64 e641 */ + btf__add_ptr(btf2, 11); /* [25] restrict enum64 (anon) */ + btf__add_struct(btf2, "with_embedded", 4); /* [26] struct with_embedded { */ + btf__add_field(btf2, "f1", 13, 0, 0); /* struct embedded f1; */ + /* } */ + btf__add_func(btf2, "fn", BTF_FUNC_STATIC, 14); /* [27] int fn(int p1); */ + btf__add_typedef(btf2, "arraytype", 15); /* [28] typedef int[3] foo; */ + btf__add_func_proto(btf2, 1); /* [29] int (*)(struct from proto p1); */ + btf__add_func_param(btf2, "p1", 16); + + VALIDATE_RAW_BTF( + btf2, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] PTR '(anon)' type_id=1", + "[3] STRUCT 's1' size=8 vlen=1\n" + "\t'f1' type_id=2 bits_offset=0", + "[4] STRUCT '(anon)' size=12 vlen=2\n" + "\t'f1' type_id=1 bits_offset=0\n" + "\t'f2' type_id=3 bits_offset=32", + "[5] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)", + "[6] UNION 'u1' size=12 vlen=2\n" + "\t'f1' type_id=1 bits_offset=0\n" + "\t'f2' type_id=2 bits_offset=0", + "[7] UNION '(anon)' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0", + "[8] ENUM 'e1' encoding=UNSIGNED size=4 vlen=1\n" + "\t'v1' val=1", + "[9] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n" + "\t'av1' val=2", + "[10] ENUM64 'e641' encoding=SIGNED size=8 vlen=1\n" + "\t'v1' val=1024", + "[11] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n" + "\t'v1' val=1025", + "[12] STRUCT 'unneeded' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0", + "[13] STRUCT 'embedded' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0", + "[14] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n" + "\t'p1' type_id=1", + "[15] ARRAY '(anon)' type_id=1 index_type_id=1 nr_elems=3", + "[16] STRUCT 'from_proto' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0", + "[17] UNION 'u1' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0", + "[18] PTR '(anon)' type_id=3", + "[19] PTR '(anon)' type_id=4", + "[20] CONST '(anon)' type_id=6", + "[21] RESTRICT '(anon)' type_id=7", + "[22] VOLATILE '(anon)' type_id=8", + "[23] TYPEDEF 'et' type_id=9", + "[24] CONST '(anon)' type_id=10", + "[25] PTR '(anon)' type_id=11", + "[26] STRUCT 'with_embedded' size=4 vlen=1\n" + "\t'f1' type_id=13 bits_offset=0", + "[27] FUNC 'fn' type_id=14 linkage=static", + "[28] TYPEDEF 'arraytype' type_id=15", + "[29] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n" + "\t'p1' type_id=16"); + + if (!ASSERT_EQ(0, btf__distill_base(btf2, &btf3, &btf4), + "distilled_base") || + !ASSERT_OK_PTR(btf3, "distilled_base") || + !ASSERT_OK_PTR(btf4, "distilled_split") || + !ASSERT_EQ(8, btf__type_cnt(btf3), "distilled_base_type_cnt")) + goto cleanup; + + VALIDATE_RAW_BTF( + btf4, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] STRUCT 's1' size=8 vlen=0", + "[3] UNION 'u1' size=12 vlen=0", + "[4] ENUM 'e1' encoding=UNSIGNED size=4 vlen=0", + "[5] ENUM 'e641' encoding=UNSIGNED size=8 vlen=0", + "[6] STRUCT 'embedded' size=4 vlen=0", + "[7] STRUCT 'from_proto' size=4 vlen=0", + /* split BTF; these types should match split BTF above from 17-28, with + * updated type id references + */ + "[8] PTR '(anon)' type_id=2", + "[9] PTR '(anon)' type_id=20", + "[10] CONST '(anon)' type_id=3", + "[11] RESTRICT '(anon)' type_id=21", + "[12] VOLATILE '(anon)' type_id=4", + "[13] TYPEDEF 'et' type_id=22", + "[14] CONST '(anon)' type_id=5", + "[15] PTR '(anon)' type_id=23", + "[16] STRUCT 'with_embedded' size=4 vlen=1\n" + "\t'f1' type_id=6 bits_offset=0", + "[17] FUNC 'fn' type_id=24 linkage=static", + "[18] TYPEDEF 'arraytype' type_id=25", + "[19] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n" + "\t'p1' type_id=7", + /* split BTF types added from original base BTF below */ + "[20] STRUCT '(anon)' size=12 vlen=2\n" + "\t'f1' type_id=1 bits_offset=0\n" + "\t'f2' type_id=2 bits_offset=32", + "[21] UNION '(anon)' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0", + "[22] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n" + "\t'av1' val=2", + "[23] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n" + "\t'v1' val=1025", + "[24] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n" + "\t'p1' type_id=1", + "[25] ARRAY '(anon)' type_id=1 index_type_id=1 nr_elems=3"); + +cleanup: + btf__free(btf4); + btf__free(btf3); + btf__free(btf2); + btf__free(btf1); +} + +/* create split reference BTF from vmlinux + split BTF with a few type references; + * ensure the resultant split reference BTF is as expected, containing only types + * needed to disambiguate references from split BTF. + */ +static void test_distilled_base_vmlinux(void) +{ + struct btf *split_btf = NULL, *vmlinux_btf = btf__load_vmlinux_btf(); + struct btf *split_dist = NULL, *base_dist = NULL; + __s32 int_id, myint_id; + + if (!ASSERT_OK_PTR(vmlinux_btf, "load_vmlinux")) + return; + int_id = btf__find_by_name_kind(vmlinux_btf, "int", BTF_KIND_INT); + if (!ASSERT_GT(int_id, 0, "find_int")) + goto cleanup; + split_btf = btf__new_empty_split(vmlinux_btf); + if (!ASSERT_OK_PTR(split_btf, "new_split")) + goto cleanup; + myint_id = btf__add_typedef(split_btf, "myint", int_id); + btf__add_ptr(split_btf, myint_id); + + if (!ASSERT_EQ(btf__distill_base(split_btf, &base_dist, &split_dist), 0, + "distill_vmlinux_base")) + goto cleanup; + + if (!ASSERT_OK_PTR(split_dist, "split_distilled") || + !ASSERT_OK_PTR(base_dist, "base_dist")) + goto cleanup; + VALIDATE_RAW_BTF( + split_dist, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] TYPEDEF 'myint' type_id=1", + "[3] PTR '(anon)' type_id=2"); + +cleanup: + btf__free(split_dist); + btf__free(base_dist); + btf__free(split_btf); + btf__free(vmlinux_btf); +} + +void test_btf_distill(void) +{ + if (test__start_subtest("distilled_base")) + test_distilled_base(); + if (test__start_subtest("distilled_base_vmlinux")) + test_distilled_base_vmlinux(); +} -- 2.34.1
From: Alan Maguire <alan.maguire@oracle.com> mainline inclusion from mainline-v6.11-rc1 commit 19e00c897d5031bed969dd79af28e899e038009f category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Map distilled base BTF type ids referenced in split BTF and their references to the base BTF passed in, and if the mapping succeeds, reparent the split BTF to the base BTF. Relocation is done by first verifying that distilled base BTF only consists of named INT, FLOAT, ENUM, FWD, STRUCT and UNION kinds; then we sort these to speed lookups. Once sorted, the base BTF is iterated, and for each relevant kind we check for an equivalent in distilled base BTF. When found, the mapping from distilled -> base BTF id and string offset is recorded. In establishing mappings, we need to ensure we check STRUCT/UNION size when the STRUCT/UNION is embedded in a split BTF STRUCT/UNION, and when duplicate names exist for the same STRUCT/UNION. Otherwise size is ignored in matching STRUCT/UNIONs. Once all mappings are established, we can update type ids and string offsets in split BTF and reparent it to the new base. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240613095014.357981-4-alan.maguire@oracle.com Conflicts: tools/lib/bpf/Build tools/lib/bpf/libbpf.map [The conflicts were due to btf__relocate is LIBBPF_1.5.0 API, but current version of libbpf is LIBBPF_1.3.0] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/Build | 2 +- tools/lib/bpf/btf.c | 17 ++ tools/lib/bpf/btf.h | 14 + tools/lib/bpf/btf_relocate.c | 506 ++++++++++++++++++++++++++++++++ tools/lib/bpf/libbpf.map | 1 + tools/lib/bpf/libbpf_internal.h | 3 + 6 files changed, 542 insertions(+), 1 deletion(-) create mode 100644 tools/lib/bpf/btf_relocate.c diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build index 2d0c282c8588..18c006c4be71 100644 --- a/tools/lib/bpf/Build +++ b/tools/lib/bpf/Build @@ -1,4 +1,4 @@ libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \ netlink.o bpf_prog_linfo.o libbpf_probes.o hashmap.o \ btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \ - usdt.o zip.o elf.o + usdt.o zip.o elf.o btf_relocate.o diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index 5ef5544578ab..0efb0a645e3d 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -5415,3 +5415,20 @@ int btf__distill_base(const struct btf *src_btf, struct btf **new_base_btf, return 0; } + +const struct btf_header *btf_header(const struct btf *btf) +{ + return btf->hdr; +} + +void btf_set_base_btf(struct btf *btf, const struct btf *base_btf) +{ + btf->base_btf = (struct btf *)base_btf; + btf->start_id = btf__type_cnt(base_btf); + btf->start_str_off = base_btf->hdr->str_len; +} + +int btf__relocate(struct btf *btf, const struct btf *base_btf) +{ + return libbpf_err(btf_relocate(btf, base_btf, NULL)); +} diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h index cb08ee9a5a10..8a93120b7385 100644 --- a/tools/lib/bpf/btf.h +++ b/tools/lib/bpf/btf.h @@ -252,6 +252,20 @@ struct btf_dedup_opts { LIBBPF_API int btf__dedup(struct btf *btf, const struct btf_dedup_opts *opts); +/** + * @brief **btf__relocate()** will check the split BTF *btf* for references + * to base BTF kinds, and verify those references are compatible with + * *base_btf*; if they are, *btf* is adjusted such that is re-parented to + * *base_btf* and type ids and strings are adjusted to accommodate this. + * + * If successful, 0 is returned and **btf** now has **base_btf** as its + * base. + * + * A negative value is returned on error and the thread-local `errno` variable + * is set to the error code as well. + */ +LIBBPF_API int btf__relocate(struct btf *btf, const struct btf *base_btf); + struct btf_dump; struct btf_dump_opts { diff --git a/tools/lib/bpf/btf_relocate.c b/tools/lib/bpf/btf_relocate.c new file mode 100644 index 000000000000..eabb8755f662 --- /dev/null +++ b/tools/lib/bpf/btf_relocate.c @@ -0,0 +1,506 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2024, Oracle and/or its affiliates. */ + +#ifndef _GNU_SOURCE +#define _GNU_SOURCE +#endif + +#include "btf.h" +#include "bpf.h" +#include "libbpf.h" +#include "libbpf_internal.h" + +struct btf; + +struct btf_relocate { + struct btf *btf; + const struct btf *base_btf; + const struct btf *dist_base_btf; + unsigned int nr_base_types; + unsigned int nr_split_types; + unsigned int nr_dist_base_types; + int dist_str_len; + int base_str_len; + __u32 *id_map; + __u32 *str_map; +}; + +/* Set temporarily in relocation id_map if distilled base struct/union is + * embedded in a split BTF struct/union; in such a case, size information must + * match between distilled base BTF and base BTF representation of type. + */ +#define BTF_IS_EMBEDDED ((__u32)-1) + +/* <name, size, id> triple used in sorting/searching distilled base BTF. */ +struct btf_name_info { + const char *name; + /* set when search requires a size match */ + int needs_size:1, + size:31; + __u32 id; +}; + +static int btf_relocate_rewrite_type_id(struct btf_relocate *r, __u32 i) +{ + struct btf_type *t = btf_type_by_id(r->btf, i); + struct btf_field_iter it; + __u32 *id; + int err; + + err = btf_field_iter_init(&it, t, BTF_FIELD_ITER_IDS); + if (err) + return err; + + while ((id = btf_field_iter_next(&it))) + *id = r->id_map[*id]; + return 0; +} + +/* Simple string comparison used for sorting within BTF, since all distilled + * types are named. If strings match, and size is non-zero for both elements + * fall back to using size for ordering. + */ +static int cmp_btf_name_size(const void *n1, const void *n2) +{ + const struct btf_name_info *ni1 = n1; + const struct btf_name_info *ni2 = n2; + int name_diff = strcmp(ni1->name, ni2->name); + + if (!name_diff && ni1->needs_size && ni2->needs_size) + return ni2->size - ni1->size; + return name_diff; +} + +/* Binary search with a small twist; find leftmost element that matches + * so that we can then iterate through all exact matches. So for example + * searching { "a", "bb", "bb", "c" } we would always match on the + * leftmost "bb". + */ +static struct btf_name_info *search_btf_name_size(struct btf_name_info *key, + struct btf_name_info *vals, + int nelems) +{ + struct btf_name_info *ret = NULL; + int high = nelems - 1; + int low = 0; + + while (low <= high) { + int mid = (low + high)/2; + struct btf_name_info *val = &vals[mid]; + int diff = cmp_btf_name_size(key, val); + + if (diff == 0) + ret = val; + /* even if found, keep searching for leftmost match */ + if (diff <= 0) + high = mid - 1; + else + low = mid + 1; + } + return ret; +} + +/* If a member of a split BTF struct/union refers to a base BTF + * struct/union, mark that struct/union id temporarily in the id_map + * with BTF_IS_EMBEDDED. Members can be const/restrict/volatile/typedef + * reference types, but if a pointer is encountered, the type is no longer + * considered embedded. + */ +static int btf_mark_embedded_composite_type_ids(struct btf_relocate *r, __u32 i) +{ + struct btf_type *t = btf_type_by_id(r->btf, i); + struct btf_field_iter it; + __u32 *id; + int err; + + if (!btf_is_composite(t)) + return 0; + + err = btf_field_iter_init(&it, t, BTF_FIELD_ITER_IDS); + if (err) + return err; + + while ((id = btf_field_iter_next(&it))) { + __u32 next_id = *id; + + while (next_id) { + t = btf_type_by_id(r->btf, next_id); + switch (btf_kind(t)) { + case BTF_KIND_CONST: + case BTF_KIND_RESTRICT: + case BTF_KIND_VOLATILE: + case BTF_KIND_TYPEDEF: + case BTF_KIND_TYPE_TAG: + next_id = t->type; + break; + case BTF_KIND_ARRAY: { + struct btf_array *a = btf_array(t); + + next_id = a->type; + break; + } + case BTF_KIND_STRUCT: + case BTF_KIND_UNION: + if (next_id < r->nr_dist_base_types) + r->id_map[next_id] = BTF_IS_EMBEDDED; + next_id = 0; + break; + default: + next_id = 0; + break; + } + } + } + + return 0; +} + +/* Build a map from distilled base BTF ids to base BTF ids. To do so, iterate + * through base BTF looking up distilled type (using binary search) equivalents. + */ +static int btf_relocate_map_distilled_base(struct btf_relocate *r) +{ + struct btf_name_info *dist_base_info_sorted, *dist_base_info_sorted_end; + struct btf_type *base_t, *dist_t; + __u8 *base_name_cnt = NULL; + int err = 0; + __u32 id; + + /* generate a sort index array of name/type ids sorted by name for + * distilled base BTF to speed name-based lookups. + */ + dist_base_info_sorted = calloc(r->nr_dist_base_types, sizeof(*dist_base_info_sorted)); + if (!dist_base_info_sorted) { + err = -ENOMEM; + goto done; + } + dist_base_info_sorted_end = dist_base_info_sorted + r->nr_dist_base_types; + for (id = 0; id < r->nr_dist_base_types; id++) { + dist_t = btf_type_by_id(r->dist_base_btf, id); + dist_base_info_sorted[id].name = btf__name_by_offset(r->dist_base_btf, + dist_t->name_off); + dist_base_info_sorted[id].id = id; + dist_base_info_sorted[id].size = dist_t->size; + dist_base_info_sorted[id].needs_size = true; + } + qsort(dist_base_info_sorted, r->nr_dist_base_types, sizeof(*dist_base_info_sorted), + cmp_btf_name_size); + + /* Mark distilled base struct/union members of split BTF structs/unions + * in id_map with BTF_IS_EMBEDDED; this signals that these types + * need to match both name and size, otherwise embeddding the base + * struct/union in the split type is invalid. + */ + for (id = r->nr_dist_base_types; id < r->nr_split_types; id++) { + err = btf_mark_embedded_composite_type_ids(r, id); + if (err) + goto done; + } + + /* Collect name counts for composite types in base BTF. If multiple + * instances of a struct/union of the same name exist, we need to use + * size to determine which to map to since name alone is ambiguous. + */ + base_name_cnt = calloc(r->base_str_len, sizeof(*base_name_cnt)); + if (!base_name_cnt) { + err = -ENOMEM; + goto done; + } + for (id = 1; id < r->nr_base_types; id++) { + base_t = btf_type_by_id(r->base_btf, id); + if (!btf_is_composite(base_t) || !base_t->name_off) + continue; + if (base_name_cnt[base_t->name_off] < 255) + base_name_cnt[base_t->name_off]++; + } + + /* Now search base BTF for matching distilled base BTF types. */ + for (id = 1; id < r->nr_base_types; id++) { + struct btf_name_info *dist_name_info, *dist_name_info_next = NULL; + struct btf_name_info base_name_info = {}; + int dist_kind, base_kind; + + base_t = btf_type_by_id(r->base_btf, id); + /* distilled base consists of named types only. */ + if (!base_t->name_off) + continue; + base_kind = btf_kind(base_t); + base_name_info.id = id; + base_name_info.name = btf__name_by_offset(r->base_btf, base_t->name_off); + switch (base_kind) { + case BTF_KIND_INT: + case BTF_KIND_FLOAT: + case BTF_KIND_ENUM: + case BTF_KIND_ENUM64: + /* These types should match both name and size */ + base_name_info.needs_size = true; + base_name_info.size = base_t->size; + break; + case BTF_KIND_FWD: + /* No size considerations for fwds. */ + break; + case BTF_KIND_STRUCT: + case BTF_KIND_UNION: + /* Size only needs to be used for struct/union if there + * are multiple types in base BTF with the same name. + * If there are multiple _distilled_ types with the same + * name (a very unlikely scenario), that doesn't matter + * unless corresponding _base_ types to match them are + * missing. + */ + base_name_info.needs_size = base_name_cnt[base_t->name_off] > 1; + base_name_info.size = base_t->size; + break; + default: + continue; + } + /* iterate over all matching distilled base types */ + for (dist_name_info = search_btf_name_size(&base_name_info, dist_base_info_sorted, + r->nr_dist_base_types); + dist_name_info != NULL; dist_name_info = dist_name_info_next) { + /* Are there more distilled matches to process after + * this one? + */ + dist_name_info_next = dist_name_info + 1; + if (dist_name_info_next >= dist_base_info_sorted_end || + cmp_btf_name_size(&base_name_info, dist_name_info_next)) + dist_name_info_next = NULL; + + if (!dist_name_info->id || dist_name_info->id > r->nr_dist_base_types) { + pr_warn("base BTF id [%d] maps to invalid distilled base BTF id [%d]\n", + id, dist_name_info->id); + err = -EINVAL; + goto done; + } + dist_t = btf_type_by_id(r->dist_base_btf, dist_name_info->id); + dist_kind = btf_kind(dist_t); + + /* Validate that the found distilled type is compatible. + * Do not error out on mismatch as another match may + * occur for an identically-named type. + */ + switch (dist_kind) { + case BTF_KIND_FWD: + switch (base_kind) { + case BTF_KIND_FWD: + if (btf_kflag(dist_t) != btf_kflag(base_t)) + continue; + break; + case BTF_KIND_STRUCT: + if (btf_kflag(base_t)) + continue; + break; + case BTF_KIND_UNION: + if (!btf_kflag(base_t)) + continue; + break; + default: + continue; + } + break; + case BTF_KIND_INT: + if (dist_kind != base_kind || + btf_int_encoding(base_t) != btf_int_encoding(dist_t)) + continue; + break; + case BTF_KIND_FLOAT: + if (dist_kind != base_kind) + continue; + break; + case BTF_KIND_ENUM: + /* ENUM and ENUM64 are encoded as sized ENUM in + * distilled base BTF. + */ + if (base_kind != dist_kind && base_kind != BTF_KIND_ENUM64) + continue; + break; + case BTF_KIND_STRUCT: + case BTF_KIND_UNION: + /* size verification is required for embedded + * struct/unions. + */ + if (r->id_map[dist_name_info->id] == BTF_IS_EMBEDDED && + base_t->size != dist_t->size) + continue; + break; + default: + continue; + } + if (r->id_map[dist_name_info->id] && + r->id_map[dist_name_info->id] != BTF_IS_EMBEDDED) { + /* we already have a match; this tells us that + * multiple base types of the same name + * have the same size, since for cases where + * multiple types have the same name we match + * on name and size. In this case, we have + * no way of determining which to relocate + * to in base BTF, so error out. + */ + pr_warn("distilled base BTF type '%s' [%u], size %u has multiple candidates of the same size (ids [%u, %u]) in base BTF\n", + base_name_info.name, dist_name_info->id, + base_t->size, id, r->id_map[dist_name_info->id]); + err = -EINVAL; + goto done; + } + /* map id and name */ + r->id_map[dist_name_info->id] = id; + r->str_map[dist_t->name_off] = base_t->name_off; + } + } + /* ensure all distilled BTF ids now have a mapping... */ + for (id = 1; id < r->nr_dist_base_types; id++) { + const char *name; + + if (r->id_map[id] && r->id_map[id] != BTF_IS_EMBEDDED) + continue; + dist_t = btf_type_by_id(r->dist_base_btf, id); + name = btf__name_by_offset(r->dist_base_btf, dist_t->name_off); + pr_warn("distilled base BTF type '%s' [%d] is not mapped to base BTF id\n", + name, id); + err = -EINVAL; + break; + } +done: + free(base_name_cnt); + free(dist_base_info_sorted); + return err; +} + +/* distilled base should only have named int/float/enum/fwd/struct/union types. */ +static int btf_relocate_validate_distilled_base(struct btf_relocate *r) +{ + unsigned int i; + + for (i = 1; i < r->nr_dist_base_types; i++) { + struct btf_type *t = btf_type_by_id(r->dist_base_btf, i); + int kind = btf_kind(t); + + switch (kind) { + case BTF_KIND_INT: + case BTF_KIND_FLOAT: + case BTF_KIND_ENUM: + case BTF_KIND_STRUCT: + case BTF_KIND_UNION: + case BTF_KIND_FWD: + if (t->name_off) + break; + pr_warn("type [%d], kind [%d] is invalid for distilled base BTF; it is anonymous\n", + i, kind); + return -EINVAL; + default: + pr_warn("type [%d] in distilled based BTF has unexpected kind [%d]\n", + i, kind); + return -EINVAL; + } + } + return 0; +} + +static int btf_relocate_rewrite_strs(struct btf_relocate *r, __u32 i) +{ + struct btf_type *t = btf_type_by_id(r->btf, i); + struct btf_field_iter it; + __u32 *str_off; + int off, err; + + err = btf_field_iter_init(&it, t, BTF_FIELD_ITER_STRS); + if (err) + return err; + + while ((str_off = btf_field_iter_next(&it))) { + if (!*str_off) + continue; + if (*str_off >= r->dist_str_len) { + *str_off += r->base_str_len - r->dist_str_len; + } else { + off = r->str_map[*str_off]; + if (!off) { + pr_warn("string '%s' [offset %u] is not mapped to base BTF", + btf__str_by_offset(r->btf, off), *str_off); + return -ENOENT; + } + *str_off = off; + } + } + return 0; +} + +/* If successful, output of relocation is updated BTF with base BTF pointing + * at base_btf, and type ids, strings adjusted accordingly. + */ +int btf_relocate(struct btf *btf, const struct btf *base_btf, __u32 **id_map) +{ + unsigned int nr_types = btf__type_cnt(btf); + const struct btf_header *dist_base_hdr; + const struct btf_header *base_hdr; + struct btf_relocate r = {}; + int err = 0; + __u32 id, i; + + r.dist_base_btf = btf__base_btf(btf); + if (!base_btf || r.dist_base_btf == base_btf) + return -EINVAL; + + r.nr_dist_base_types = btf__type_cnt(r.dist_base_btf); + r.nr_base_types = btf__type_cnt(base_btf); + r.nr_split_types = nr_types - r.nr_dist_base_types; + r.btf = btf; + r.base_btf = base_btf; + + r.id_map = calloc(nr_types, sizeof(*r.id_map)); + r.str_map = calloc(btf_header(r.dist_base_btf)->str_len, sizeof(*r.str_map)); + dist_base_hdr = btf_header(r.dist_base_btf); + base_hdr = btf_header(r.base_btf); + r.dist_str_len = dist_base_hdr->str_len; + r.base_str_len = base_hdr->str_len; + if (!r.id_map || !r.str_map) { + err = -ENOMEM; + goto err_out; + } + + err = btf_relocate_validate_distilled_base(&r); + if (err) + goto err_out; + + /* Split BTF ids need to be adjusted as base and distilled base + * have different numbers of types, changing the start id of split + * BTF. + */ + for (id = r.nr_dist_base_types; id < nr_types; id++) + r.id_map[id] = id + r.nr_base_types - r.nr_dist_base_types; + + /* Build a map from distilled base ids to actual base BTF ids; it is used + * to update split BTF id references. Also build a str_map mapping from + * distilled base BTF names to base BTF names. + */ + err = btf_relocate_map_distilled_base(&r); + if (err) + goto err_out; + + /* Next, rewrite type ids in split BTF, replacing split ids with updated + * ids based on number of types in base BTF, and base ids with + * relocated ids from base_btf. + */ + for (i = 0, id = r.nr_dist_base_types; i < r.nr_split_types; i++, id++) { + err = btf_relocate_rewrite_type_id(&r, id); + if (err) + goto err_out; + } + /* String offsets now need to be updated using the str_map. */ + for (i = 0; i < r.nr_split_types; i++) { + err = btf_relocate_rewrite_strs(&r, i + r.nr_dist_base_types); + if (err) + goto err_out; + } + /* Finally reset base BTF to be base_btf */ + btf_set_base_btf(btf, base_btf); + + if (id_map) { + *id_map = r.id_map; + r.id_map = NULL; + } +err_out: + free(r.id_map); + free(r.str_map); + return err; +} diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map index acf4e94bebc9..14476d734601 100644 --- a/tools/lib/bpf/libbpf.map +++ b/tools/lib/bpf/libbpf.map @@ -396,6 +396,7 @@ LIBBPF_1.3.0 { global: btf__new_split; btf__distill_base; + btf__relocate; bpf_obj_pin_opts; bpf_object__unpin; bpf_prog_detach_opts; diff --git a/tools/lib/bpf/libbpf_internal.h b/tools/lib/bpf/libbpf_internal.h index 2f594b4b732f..e88f49ea64ef 100644 --- a/tools/lib/bpf/libbpf_internal.h +++ b/tools/lib/bpf/libbpf_internal.h @@ -233,6 +233,9 @@ struct btf_type; struct btf_type *btf_type_by_id(const struct btf *btf, __u32 type_id); const char *btf_kind_str(const struct btf_type *t); const struct btf_type *skip_mods_and_typedefs(const struct btf *btf, __u32 id, __u32 *res_id); +const struct btf_header *btf_header(const struct btf *btf); +void btf_set_base_btf(struct btf *btf, const struct btf *base_btf); +int btf_relocate(struct btf *btf, const struct btf *base_btf, __u32 **id_map); static inline enum btf_func_linkage btf_func_linkage(const struct btf_type *t) { -- 2.34.1
From: Alan Maguire <alan.maguire@oracle.com> mainline inclusion from mainline-v6.11-rc1 commit affdeb50616b190c3236cc2bf116e1b931a43be2 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Ensure relocated BTF looks as expected; in this case identical to original split BTF, with a few duplicate anonymous types added to split BTF by the relocation process. Also add relocation tests for edge cases like missing type in base BTF and multiple types of the same name. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240613095014.357981-5-alan.maguire@oracle.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/prog_tests/btf_distill.c | 278 ++++++++++++++++++ 1 file changed, 278 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/btf_distill.c b/tools/testing/selftests/bpf/prog_tests/btf_distill.c index 5c3a38747962..bfbe795823a2 100644 --- a/tools/testing/selftests/bpf/prog_tests/btf_distill.c +++ b/tools/testing/selftests/bpf/prog_tests/btf_distill.c @@ -217,7 +217,277 @@ static void test_distilled_base(void) "\t'p1' type_id=1", "[25] ARRAY '(anon)' type_id=1 index_type_id=1 nr_elems=3"); + if (!ASSERT_EQ(btf__relocate(btf4, btf1), 0, "relocate_split")) + goto cleanup; + + VALIDATE_RAW_BTF( + btf4, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] PTR '(anon)' type_id=1", + "[3] STRUCT 's1' size=8 vlen=1\n" + "\t'f1' type_id=2 bits_offset=0", + "[4] STRUCT '(anon)' size=12 vlen=2\n" + "\t'f1' type_id=1 bits_offset=0\n" + "\t'f2' type_id=3 bits_offset=32", + "[5] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none)", + "[6] UNION 'u1' size=12 vlen=2\n" + "\t'f1' type_id=1 bits_offset=0\n" + "\t'f2' type_id=2 bits_offset=0", + "[7] UNION '(anon)' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0", + "[8] ENUM 'e1' encoding=UNSIGNED size=4 vlen=1\n" + "\t'v1' val=1", + "[9] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n" + "\t'av1' val=2", + "[10] ENUM64 'e641' encoding=SIGNED size=8 vlen=1\n" + "\t'v1' val=1024", + "[11] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n" + "\t'v1' val=1025", + "[12] STRUCT 'unneeded' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0", + "[13] STRUCT 'embedded' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0", + "[14] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n" + "\t'p1' type_id=1", + "[15] ARRAY '(anon)' type_id=1 index_type_id=1 nr_elems=3", + "[16] STRUCT 'from_proto' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0", + "[17] UNION 'u1' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0", + "[18] PTR '(anon)' type_id=3", + "[19] PTR '(anon)' type_id=30", + "[20] CONST '(anon)' type_id=6", + "[21] RESTRICT '(anon)' type_id=31", + "[22] VOLATILE '(anon)' type_id=8", + "[23] TYPEDEF 'et' type_id=32", + "[24] CONST '(anon)' type_id=10", + "[25] PTR '(anon)' type_id=33", + "[26] STRUCT 'with_embedded' size=4 vlen=1\n" + "\t'f1' type_id=13 bits_offset=0", + "[27] FUNC 'fn' type_id=34 linkage=static", + "[28] TYPEDEF 'arraytype' type_id=35", + "[29] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n" + "\t'p1' type_id=16", + /* below here are (duplicate) anon base types added by distill + * process to split BTF. + */ + "[30] STRUCT '(anon)' size=12 vlen=2\n" + "\t'f1' type_id=1 bits_offset=0\n" + "\t'f2' type_id=3 bits_offset=32", + "[31] UNION '(anon)' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0", + "[32] ENUM '(anon)' encoding=UNSIGNED size=4 vlen=1\n" + "\t'av1' val=2", + "[33] ENUM64 '(anon)' encoding=SIGNED size=8 vlen=1\n" + "\t'v1' val=1025", + "[34] FUNC_PROTO '(anon)' ret_type_id=1 vlen=1\n" + "\t'p1' type_id=1", + "[35] ARRAY '(anon)' type_id=1 index_type_id=1 nr_elems=3"); + +cleanup: + btf__free(btf4); + btf__free(btf3); + btf__free(btf2); + btf__free(btf1); +} + +/* ensure we can cope with multiple types with the same name in + * distilled base BTF. In this case because sizes are different, + * we can still disambiguate them. + */ +static void test_distilled_base_multi(void) +{ + struct btf *btf1 = NULL, *btf2 = NULL, *btf3 = NULL, *btf4 = NULL; + + btf1 = btf__new_empty(); + if (!ASSERT_OK_PTR(btf1, "empty_main_btf")) + return; + btf__add_int(btf1, "int", 4, BTF_INT_SIGNED); /* [1] int */ + btf__add_int(btf1, "int", 8, BTF_INT_SIGNED); /* [2] int */ + VALIDATE_RAW_BTF( + btf1, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] INT 'int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED"); + btf2 = btf__new_empty_split(btf1); + if (!ASSERT_OK_PTR(btf2, "empty_split_btf")) + goto cleanup; + btf__add_ptr(btf2, 1); + btf__add_const(btf2, 2); + VALIDATE_RAW_BTF( + btf2, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] INT 'int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED", + "[3] PTR '(anon)' type_id=1", + "[4] CONST '(anon)' type_id=2"); + if (!ASSERT_EQ(0, btf__distill_base(btf2, &btf3, &btf4), + "distilled_base") || + !ASSERT_OK_PTR(btf3, "distilled_base") || + !ASSERT_OK_PTR(btf4, "distilled_split") || + !ASSERT_EQ(3, btf__type_cnt(btf3), "distilled_base_type_cnt")) + goto cleanup; + VALIDATE_RAW_BTF( + btf3, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] INT 'int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED"); + if (!ASSERT_EQ(btf__relocate(btf4, btf1), 0, "relocate_split")) + goto cleanup; + + VALIDATE_RAW_BTF( + btf4, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] INT 'int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED", + "[3] PTR '(anon)' type_id=1", + "[4] CONST '(anon)' type_id=2"); + +cleanup: + btf__free(btf4); + btf__free(btf3); + btf__free(btf2); + btf__free(btf1); +} + +/* If a needed type is not present in the base BTF we wish to relocate + * with, btf__relocate() should error our. + */ +static void test_distilled_base_missing_err(void) +{ + struct btf *btf1 = NULL, *btf2 = NULL, *btf3 = NULL, *btf4 = NULL, *btf5 = NULL; + + btf1 = btf__new_empty(); + if (!ASSERT_OK_PTR(btf1, "empty_main_btf")) + return; + btf__add_int(btf1, "int", 4, BTF_INT_SIGNED); /* [1] int */ + btf__add_int(btf1, "int", 8, BTF_INT_SIGNED); /* [2] int */ + VALIDATE_RAW_BTF( + btf1, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] INT 'int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED"); + btf2 = btf__new_empty_split(btf1); + if (!ASSERT_OK_PTR(btf2, "empty_split_btf")) + goto cleanup; + btf__add_ptr(btf2, 1); + btf__add_const(btf2, 2); + VALIDATE_RAW_BTF( + btf2, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] INT 'int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED", + "[3] PTR '(anon)' type_id=1", + "[4] CONST '(anon)' type_id=2"); + if (!ASSERT_EQ(0, btf__distill_base(btf2, &btf3, &btf4), + "distilled_base") || + !ASSERT_OK_PTR(btf3, "distilled_base") || + !ASSERT_OK_PTR(btf4, "distilled_split") || + !ASSERT_EQ(3, btf__type_cnt(btf3), "distilled_base_type_cnt")) + goto cleanup; + VALIDATE_RAW_BTF( + btf3, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] INT 'int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED"); + btf5 = btf__new_empty(); + if (!ASSERT_OK_PTR(btf5, "empty_reloc_btf")) + return; + btf__add_int(btf5, "int", 4, BTF_INT_SIGNED); /* [1] int */ + VALIDATE_RAW_BTF( + btf5, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED"); + ASSERT_EQ(btf__relocate(btf4, btf5), -EINVAL, "relocate_split"); + +cleanup: + btf__free(btf5); + btf__free(btf4); + btf__free(btf3); + btf__free(btf2); + btf__free(btf1); +} + +/* With 2 types of same size in distilled base BTF, relocation should + * fail as we have no means to choose between them. + */ +static void test_distilled_base_multi_err(void) +{ + struct btf *btf1 = NULL, *btf2 = NULL, *btf3 = NULL, *btf4 = NULL; + + btf1 = btf__new_empty(); + if (!ASSERT_OK_PTR(btf1, "empty_main_btf")) + return; + btf__add_int(btf1, "int", 4, BTF_INT_SIGNED); /* [1] int */ + btf__add_int(btf1, "int", 4, BTF_INT_SIGNED); /* [2] int */ + VALIDATE_RAW_BTF( + btf1, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED"); + btf2 = btf__new_empty_split(btf1); + if (!ASSERT_OK_PTR(btf2, "empty_split_btf")) + goto cleanup; + btf__add_ptr(btf2, 1); + btf__add_const(btf2, 2); + VALIDATE_RAW_BTF( + btf2, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[3] PTR '(anon)' type_id=1", + "[4] CONST '(anon)' type_id=2"); + if (!ASSERT_EQ(0, btf__distill_base(btf2, &btf3, &btf4), + "distilled_base") || + !ASSERT_OK_PTR(btf3, "distilled_base") || + !ASSERT_OK_PTR(btf4, "distilled_split") || + !ASSERT_EQ(3, btf__type_cnt(btf3), "distilled_base_type_cnt")) + goto cleanup; + VALIDATE_RAW_BTF( + btf3, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED"); + ASSERT_EQ(btf__relocate(btf4, btf1), -EINVAL, "relocate_split"); +cleanup: + btf__free(btf4); + btf__free(btf3); + btf__free(btf2); + btf__free(btf1); +} + +/* With 2 types of same size in base BTF, relocation should + * fail as we have no means to choose between them. + */ +static void test_distilled_base_multi_err2(void) +{ + struct btf *btf1 = NULL, *btf2 = NULL, *btf3 = NULL, *btf4 = NULL, *btf5 = NULL; + + btf1 = btf__new_empty(); + if (!ASSERT_OK_PTR(btf1, "empty_main_btf")) + return; + btf__add_int(btf1, "int", 4, BTF_INT_SIGNED); /* [1] int */ + VALIDATE_RAW_BTF( + btf1, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED"); + btf2 = btf__new_empty_split(btf1); + if (!ASSERT_OK_PTR(btf2, "empty_split_btf")) + goto cleanup; + btf__add_ptr(btf2, 1); + VALIDATE_RAW_BTF( + btf2, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] PTR '(anon)' type_id=1"); + if (!ASSERT_EQ(0, btf__distill_base(btf2, &btf3, &btf4), + "distilled_base") || + !ASSERT_OK_PTR(btf3, "distilled_base") || + !ASSERT_OK_PTR(btf4, "distilled_split") || + !ASSERT_EQ(2, btf__type_cnt(btf3), "distilled_base_type_cnt")) + goto cleanup; + VALIDATE_RAW_BTF( + btf3, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED"); + btf5 = btf__new_empty(); + if (!ASSERT_OK_PTR(btf5, "empty_reloc_btf")) + return; + btf__add_int(btf5, "int", 4, BTF_INT_SIGNED); /* [1] int */ + btf__add_int(btf5, "int", 4, BTF_INT_SIGNED); /* [2] int */ + VALIDATE_RAW_BTF( + btf5, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED"); + ASSERT_EQ(btf__relocate(btf4, btf5), -EINVAL, "relocate_split"); cleanup: + btf__free(btf5); btf__free(btf4); btf__free(btf3); btf__free(btf2); @@ -269,6 +539,14 @@ void test_btf_distill(void) { if (test__start_subtest("distilled_base")) test_distilled_base(); + if (test__start_subtest("distilled_base_multi")) + test_distilled_base_multi(); + if (test__start_subtest("distilled_base_missing_err")) + test_distilled_base_missing_err(); + if (test__start_subtest("distilled_base_multi_err")) + test_distilled_base_multi_err(); + if (test__start_subtest("distilled_base_multi_err2")) + test_distilled_base_multi_err2(); if (test__start_subtest("distilled_base_vmlinux")) test_distilled_base_vmlinux(); } -- 2.34.1
From: Eduard Zingerman <eddyz87@gmail.com> mainline inclusion from mainline-v6.11-rc1 commit c86f180ffc993975fed5907a869fc9b1555d0cfb category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Update btf_parse_elf() to check if .BTF.base section is present. The logic is as follows: if .BTF.base section exists: distilled_base := btf_new(.BTF.base) if distilled_base: btf := btf_new(.BTF, .base_btf=distilled_base) if base_btf: btf_relocate(btf, base_btf) else: btf := btf_new(.BTF) return btf In other words: - if .BTF.base section exists, load BTF from it and use it as a base for .BTF load; - if base_btf is specified and .BTF.base section exist, relocate newly loaded .BTF against base_btf. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240613095014.357981-6-alan.maguire@oracle.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/btf.c | 164 +++++++++++++++++++++++++++++--------------- tools/lib/bpf/btf.h | 1 + 2 files changed, 111 insertions(+), 54 deletions(-) diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index 0efb0a645e3d..cd74b6a3ec99 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -116,6 +116,9 @@ struct btf { /* whether strings are already deduplicated */ bool strs_deduped; + /* whether base_btf should be freed in btf_free for this instance */ + bool owns_base; + /* BTF object FD, if loaded into kernel */ int fd; @@ -810,6 +813,8 @@ void btf__free(struct btf *btf) free(btf->raw_data); free(btf->raw_data_swapped); free(btf->type_offs); + if (btf->owns_base) + btf__free(btf->base_btf); free(btf); } @@ -924,53 +929,38 @@ struct btf *btf__new_split(const void *data, __u32 size, struct btf *base_btf) return libbpf_ptr(btf_new(data, size, base_btf)); } -static struct btf *btf_parse_elf(const char *path, struct btf *base_btf, - struct btf_ext **btf_ext) +struct btf_elf_secs { + Elf_Data *btf_data; + Elf_Data *btf_ext_data; + Elf_Data *btf_base_data; +}; + +static int btf_find_elf_sections(Elf *elf, const char *path, struct btf_elf_secs *secs) { - Elf_Data *btf_data = NULL, *btf_ext_data = NULL; - int err = 0, fd = -1, idx = 0; - struct btf *btf = NULL; Elf_Scn *scn = NULL; - Elf *elf = NULL; + Elf_Data *data; GElf_Ehdr ehdr; size_t shstrndx; + int idx = 0; - if (elf_version(EV_CURRENT) == EV_NONE) { - pr_warn("failed to init libelf for %s\n", path); - return ERR_PTR(-LIBBPF_ERRNO__LIBELF); - } - - fd = open(path, O_RDONLY | O_CLOEXEC); - if (fd < 0) { - err = -errno; - pr_warn("failed to open %s: %s\n", path, strerror(errno)); - return ERR_PTR(err); - } - - err = -LIBBPF_ERRNO__FORMAT; - - elf = elf_begin(fd, ELF_C_READ, NULL); - if (!elf) { - pr_warn("failed to open %s as ELF file\n", path); - goto done; - } if (!gelf_getehdr(elf, &ehdr)) { pr_warn("failed to get EHDR from %s\n", path); - goto done; + goto err; } if (elf_getshdrstrndx(elf, &shstrndx)) { pr_warn("failed to get section names section index for %s\n", path); - goto done; + goto err; } if (!elf_rawdata(elf_getscn(elf, shstrndx), NULL)) { pr_warn("failed to get e_shstrndx from %s\n", path); - goto done; + goto err; } while ((scn = elf_nextscn(elf, scn)) != NULL) { + Elf_Data **field; GElf_Shdr sh; char *name; @@ -978,42 +968,102 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf, if (gelf_getshdr(scn, &sh) != &sh) { pr_warn("failed to get section(%d) header from %s\n", idx, path); - goto done; + goto err; } name = elf_strptr(elf, shstrndx, sh.sh_name); if (!name) { pr_warn("failed to get section(%d) name from %s\n", idx, path); - goto done; + goto err; } - if (strcmp(name, BTF_ELF_SEC) == 0) { - btf_data = elf_getdata(scn, 0); - if (!btf_data) { - pr_warn("failed to get section(%d, %s) data from %s\n", - idx, name, path); - goto done; - } - continue; - } else if (btf_ext && strcmp(name, BTF_EXT_ELF_SEC) == 0) { - btf_ext_data = elf_getdata(scn, 0); - if (!btf_ext_data) { - pr_warn("failed to get section(%d, %s) data from %s\n", - idx, name, path); - goto done; - } + + if (strcmp(name, BTF_ELF_SEC) == 0) + field = &secs->btf_data; + else if (strcmp(name, BTF_EXT_ELF_SEC) == 0) + field = &secs->btf_ext_data; + else if (strcmp(name, BTF_BASE_ELF_SEC) == 0) + field = &secs->btf_base_data; + else continue; + + data = elf_getdata(scn, 0); + if (!data) { + pr_warn("failed to get section(%d, %s) data from %s\n", + idx, name, path); + goto err; } + *field = data; + } + + return 0; + +err: + return -LIBBPF_ERRNO__FORMAT; +} + +static struct btf *btf_parse_elf(const char *path, struct btf *base_btf, + struct btf_ext **btf_ext) +{ + struct btf_elf_secs secs = {}; + struct btf *dist_base_btf = NULL; + struct btf *btf = NULL; + int err = 0, fd = -1; + Elf *elf = NULL; + + if (elf_version(EV_CURRENT) == EV_NONE) { + pr_warn("failed to init libelf for %s\n", path); + return ERR_PTR(-LIBBPF_ERRNO__LIBELF); + } + + fd = open(path, O_RDONLY | O_CLOEXEC); + if (fd < 0) { + err = -errno; + pr_warn("failed to open %s: %s\n", path, strerror(errno)); + return ERR_PTR(err); } - if (!btf_data) { + elf = elf_begin(fd, ELF_C_READ, NULL); + if (!elf) { + pr_warn("failed to open %s as ELF file\n", path); + goto done; + } + + err = btf_find_elf_sections(elf, path, &secs); + if (err) + goto done; + + if (!secs.btf_data) { pr_warn("failed to find '%s' ELF section in %s\n", BTF_ELF_SEC, path); err = -ENODATA; goto done; } - btf = btf_new(btf_data->d_buf, btf_data->d_size, base_btf); - err = libbpf_get_error(btf); - if (err) + + if (secs.btf_base_data) { + dist_base_btf = btf_new(secs.btf_base_data->d_buf, secs.btf_base_data->d_size, + NULL); + if (IS_ERR(dist_base_btf)) { + err = PTR_ERR(dist_base_btf); + dist_base_btf = NULL; + goto done; + } + } + + btf = btf_new(secs.btf_data->d_buf, secs.btf_data->d_size, + dist_base_btf ?: base_btf); + if (IS_ERR(btf)) { + err = PTR_ERR(btf); goto done; + } + if (dist_base_btf && base_btf) { + err = btf__relocate(btf, base_btf); + if (err) + goto done; + btf__free(dist_base_btf); + dist_base_btf = NULL; + } + + if (dist_base_btf) + btf->owns_base = true; switch (gelf_getclass(elf)) { case ELFCLASS32: @@ -1027,11 +1077,12 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf, break; } - if (btf_ext && btf_ext_data) { - *btf_ext = btf_ext__new(btf_ext_data->d_buf, btf_ext_data->d_size); - err = libbpf_get_error(*btf_ext); - if (err) + if (btf_ext && secs.btf_ext_data) { + *btf_ext = btf_ext__new(secs.btf_ext_data->d_buf, secs.btf_ext_data->d_size); + if (IS_ERR(*btf_ext)) { + err = PTR_ERR(*btf_ext); goto done; + } } else if (btf_ext) { *btf_ext = NULL; } @@ -1045,6 +1096,7 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf, if (btf_ext) btf_ext__free(*btf_ext); + btf__free(dist_base_btf); btf__free(btf); return ERR_PTR(err); @@ -5430,5 +5482,9 @@ void btf_set_base_btf(struct btf *btf, const struct btf *base_btf) int btf__relocate(struct btf *btf, const struct btf *base_btf) { - return libbpf_err(btf_relocate(btf, base_btf, NULL)); + int err = btf_relocate(btf, base_btf, NULL); + + if (!err) + btf->owns_base = false; + return libbpf_err(err); } diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h index 8a93120b7385..b68d216837a9 100644 --- a/tools/lib/bpf/btf.h +++ b/tools/lib/bpf/btf.h @@ -18,6 +18,7 @@ extern "C" { #define BTF_ELF_SEC ".BTF" #define BTF_EXT_ELF_SEC ".BTF.ext" +#define BTF_BASE_ELF_SEC ".BTF.base" #define MAPS_ELF_SEC ".maps" struct btf; -- 2.34.1
From: Alan Maguire <alan.maguire@oracle.com> mainline inclusion from mainline-v6.11-rc1 commit 6ba77385f386053cea2a1cad33717de74a26db4e category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Now that btf_parse_elf() handles .BTF.base section presence, we need to ensure that resolve_btfids uses .BTF.base when present rather than the vmlinux base BTF passed in via the -B option. Detect .BTF.base section presence and unset the base BTF path to ensure that BTF ELF parsing will do the right thing. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240613095014.357981-7-alan.maguire@oracle.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/bpf/resolve_btfids/main.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/tools/bpf/resolve_btfids/main.c b/tools/bpf/resolve_btfids/main.c index b3edc239fe56..d54aaa0619df 100644 --- a/tools/bpf/resolve_btfids/main.c +++ b/tools/bpf/resolve_btfids/main.c @@ -409,6 +409,14 @@ static int elf_collect(struct object *obj) obj->efile.idlist = data; obj->efile.idlist_shndx = idx; obj->efile.idlist_addr = sh.sh_addr; + } else if (!strcmp(name, BTF_BASE_ELF_SEC)) { + /* If a .BTF.base section is found, do not resolve + * BTF ids relative to vmlinux; resolve relative + * to the .BTF.base section instead. btf__parse_split() + * will take care of this once the base BTF it is + * passed is NULL. + */ + obj->base_btf_path = NULL; } if (compressed_section_fix(elf, scn, &sh)) -- 2.34.1
From: Alan Maguire <alan.maguire@oracle.com> mainline inclusion from mainline-v6.11-rc1 commit d1cf840854bb603c0718a011bc993f69f2df014e category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Use less verbose names in BTF relocation code and fix off-by-one error and typo in btf_relocate.c. Simplify loop over matching distilled types, moving from assigning a _next value in loop body to moving match check conditions into the guard. Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240620091733.1967885-2-alan.maguire@oracle.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/btf_relocate.c | 72 ++++++++++++++++-------------------- 1 file changed, 31 insertions(+), 41 deletions(-) diff --git a/tools/lib/bpf/btf_relocate.c b/tools/lib/bpf/btf_relocate.c index eabb8755f662..23a41fb03e0d 100644 --- a/tools/lib/bpf/btf_relocate.c +++ b/tools/lib/bpf/btf_relocate.c @@ -160,7 +160,7 @@ static int btf_mark_embedded_composite_type_ids(struct btf_relocate *r, __u32 i) */ static int btf_relocate_map_distilled_base(struct btf_relocate *r) { - struct btf_name_info *dist_base_info_sorted, *dist_base_info_sorted_end; + struct btf_name_info *info, *info_end; struct btf_type *base_t, *dist_t; __u8 *base_name_cnt = NULL; int err = 0; @@ -169,26 +169,24 @@ static int btf_relocate_map_distilled_base(struct btf_relocate *r) /* generate a sort index array of name/type ids sorted by name for * distilled base BTF to speed name-based lookups. */ - dist_base_info_sorted = calloc(r->nr_dist_base_types, sizeof(*dist_base_info_sorted)); - if (!dist_base_info_sorted) { + info = calloc(r->nr_dist_base_types, sizeof(*info)); + if (!info) { err = -ENOMEM; goto done; } - dist_base_info_sorted_end = dist_base_info_sorted + r->nr_dist_base_types; + info_end = info + r->nr_dist_base_types; for (id = 0; id < r->nr_dist_base_types; id++) { dist_t = btf_type_by_id(r->dist_base_btf, id); - dist_base_info_sorted[id].name = btf__name_by_offset(r->dist_base_btf, - dist_t->name_off); - dist_base_info_sorted[id].id = id; - dist_base_info_sorted[id].size = dist_t->size; - dist_base_info_sorted[id].needs_size = true; + info[id].name = btf__name_by_offset(r->dist_base_btf, dist_t->name_off); + info[id].id = id; + info[id].size = dist_t->size; + info[id].needs_size = true; } - qsort(dist_base_info_sorted, r->nr_dist_base_types, sizeof(*dist_base_info_sorted), - cmp_btf_name_size); + qsort(info, r->nr_dist_base_types, sizeof(*info), cmp_btf_name_size); /* Mark distilled base struct/union members of split BTF structs/unions * in id_map with BTF_IS_EMBEDDED; this signals that these types - * need to match both name and size, otherwise embeddding the base + * need to match both name and size, otherwise embedding the base * struct/union in the split type is invalid. */ for (id = r->nr_dist_base_types; id < r->nr_split_types; id++) { @@ -216,8 +214,7 @@ static int btf_relocate_map_distilled_base(struct btf_relocate *r) /* Now search base BTF for matching distilled base BTF types. */ for (id = 1; id < r->nr_base_types; id++) { - struct btf_name_info *dist_name_info, *dist_name_info_next = NULL; - struct btf_name_info base_name_info = {}; + struct btf_name_info *dist_info, base_info = {}; int dist_kind, base_kind; base_t = btf_type_by_id(r->base_btf, id); @@ -225,16 +222,16 @@ static int btf_relocate_map_distilled_base(struct btf_relocate *r) if (!base_t->name_off) continue; base_kind = btf_kind(base_t); - base_name_info.id = id; - base_name_info.name = btf__name_by_offset(r->base_btf, base_t->name_off); + base_info.id = id; + base_info.name = btf__name_by_offset(r->base_btf, base_t->name_off); switch (base_kind) { case BTF_KIND_INT: case BTF_KIND_FLOAT: case BTF_KIND_ENUM: case BTF_KIND_ENUM64: /* These types should match both name and size */ - base_name_info.needs_size = true; - base_name_info.size = base_t->size; + base_info.needs_size = true; + base_info.size = base_t->size; break; case BTF_KIND_FWD: /* No size considerations for fwds. */ @@ -248,31 +245,24 @@ static int btf_relocate_map_distilled_base(struct btf_relocate *r) * unless corresponding _base_ types to match them are * missing. */ - base_name_info.needs_size = base_name_cnt[base_t->name_off] > 1; - base_name_info.size = base_t->size; + base_info.needs_size = base_name_cnt[base_t->name_off] > 1; + base_info.size = base_t->size; break; default: continue; } /* iterate over all matching distilled base types */ - for (dist_name_info = search_btf_name_size(&base_name_info, dist_base_info_sorted, - r->nr_dist_base_types); - dist_name_info != NULL; dist_name_info = dist_name_info_next) { - /* Are there more distilled matches to process after - * this one? - */ - dist_name_info_next = dist_name_info + 1; - if (dist_name_info_next >= dist_base_info_sorted_end || - cmp_btf_name_size(&base_name_info, dist_name_info_next)) - dist_name_info_next = NULL; - - if (!dist_name_info->id || dist_name_info->id > r->nr_dist_base_types) { + for (dist_info = search_btf_name_size(&base_info, info, r->nr_dist_base_types); + dist_info != NULL && dist_info < info_end && + cmp_btf_name_size(&base_info, dist_info) == 0; + dist_info++) { + if (!dist_info->id || dist_info->id >= r->nr_dist_base_types) { pr_warn("base BTF id [%d] maps to invalid distilled base BTF id [%d]\n", - id, dist_name_info->id); + id, dist_info->id); err = -EINVAL; goto done; } - dist_t = btf_type_by_id(r->dist_base_btf, dist_name_info->id); + dist_t = btf_type_by_id(r->dist_base_btf, dist_info->id); dist_kind = btf_kind(dist_t); /* Validate that the found distilled type is compatible. @@ -319,15 +309,15 @@ static int btf_relocate_map_distilled_base(struct btf_relocate *r) /* size verification is required for embedded * struct/unions. */ - if (r->id_map[dist_name_info->id] == BTF_IS_EMBEDDED && + if (r->id_map[dist_info->id] == BTF_IS_EMBEDDED && base_t->size != dist_t->size) continue; break; default: continue; } - if (r->id_map[dist_name_info->id] && - r->id_map[dist_name_info->id] != BTF_IS_EMBEDDED) { + if (r->id_map[dist_info->id] && + r->id_map[dist_info->id] != BTF_IS_EMBEDDED) { /* we already have a match; this tells us that * multiple base types of the same name * have the same size, since for cases where @@ -337,13 +327,13 @@ static int btf_relocate_map_distilled_base(struct btf_relocate *r) * to in base BTF, so error out. */ pr_warn("distilled base BTF type '%s' [%u], size %u has multiple candidates of the same size (ids [%u, %u]) in base BTF\n", - base_name_info.name, dist_name_info->id, - base_t->size, id, r->id_map[dist_name_info->id]); + base_info.name, dist_info->id, + base_t->size, id, r->id_map[dist_info->id]); err = -EINVAL; goto done; } /* map id and name */ - r->id_map[dist_name_info->id] = id; + r->id_map[dist_info->id] = id; r->str_map[dist_t->name_off] = base_t->name_off; } } @@ -362,7 +352,7 @@ static int btf_relocate_map_distilled_base(struct btf_relocate *r) } done: free(base_name_cnt); - free(dist_base_info_sorted); + free(info); return err; } -- 2.34.1
From: Alan Maguire <alan.maguire@oracle.com> mainline inclusion from mainline-v6.11-rc1 commit d4e48e3dd45017abdd69a19285d197de897ef44f category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- ...as this will allow split BTF modules with a base BTF representation (rather than the full vmlinux BTF at time of BTF encoding) to resolve their references to kernel types in a way that is more resilient to small changes in kernel types. This will allow modules that are not built every time the kernel is to provide more resilient BTF, rather than have it invalidated every time BTF ids for core kernel types change. Fields are ordered to avoid holes in struct module. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240620091733.1967885-3-alan.maguire@oracle.com Conflicts: include/linux/module.h [The conflicts were due to some minor issue.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/module.h | 2 ++ kernel/module/main.c | 5 ++++- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/include/linux/module.h b/include/linux/module.h index 63edb1d830d3..96f1a88b69c7 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -517,7 +517,9 @@ struct module { #endif #ifdef CONFIG_DEBUG_INFO_BTF_MODULES unsigned int btf_data_size; + unsigned int btf_base_data_size; void *btf_data; + void *btf_base_data; #else KABI_DEPRECATE(unsigned int, btf_data_size) KABI_DEPRECATE(void *, btf_data) diff --git a/kernel/module/main.c b/kernel/module/main.c index a221b102f0e3..5bb35a5fe2d0 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -2164,6 +2164,8 @@ static int find_module_sections(struct module *mod, struct load_info *info) #endif #ifdef CONFIG_DEBUG_INFO_BTF_MODULES mod->btf_data = any_section_objs(info, ".BTF", 1, &mod->btf_data_size); + mod->btf_base_data = any_section_objs(info, ".BTF.base", 1, + &mod->btf_base_data_size); #endif #ifdef CONFIG_JUMP_LABEL mod->jump_entries = section_objs(info, "__jump_table", @@ -2598,8 +2600,9 @@ static noinline int do_init_module(struct module *mod) } #ifdef CONFIG_DEBUG_INFO_BTF_MODULES - /* .BTF is not SHF_ALLOC and will get removed, so sanitize pointer */ + /* .BTF is not SHF_ALLOC and will get removed, so sanitize pointers */ mod->btf_data = NULL; + mod->btf_base_data = NULL; #endif /* * We want to free module_init, but be aware that kallsyms may be -- 2.34.1
From: Pu Lehui <pulehui@huawei.com> hulk inclusion category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA -------------------------------- When backporting resilient split BTF, we need to add the members `unsigned int btf_base_data_size` and `void *btf_base_data` to the struct module. According to the layout of the struct module shown below, we can use KABI_FILE_HOLE to fill `unsigned int btf_base_data_size` and use the reserved space with KABI_USE to handle the rest. Additionally, since CONFIG_DEBUG_INFO_BTF_MODULES has been re-enabled, we can remove the unused KABI_DEPRECATE. struct module { ... /* --- cacheline 17 boundary (1088 bytes) --- */ unsigned int btf_data_size; /* 1088 4 */ /* XXX 4 bytes hole, try to pack */ void * btf_data; /* 1096 8 */ struct jump_entry * jump_entries; /* 1104 8 */ ... } Fixes: de93c21fa0f7 ("[Backport] module, bpf: Store BTF base pointer in struct module") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/module.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/include/linux/module.h b/include/linux/module.h index 96f1a88b69c7..a4204d279fc9 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -517,9 +517,8 @@ struct module { #endif #ifdef CONFIG_DEBUG_INFO_BTF_MODULES unsigned int btf_data_size; - unsigned int btf_base_data_size; + KABI_FILL_HOLE(unsigned int btf_base_data_size) void *btf_data; - void *btf_base_data; #else KABI_DEPRECATE(unsigned int, btf_data_size) KABI_DEPRECATE(void *, btf_data) @@ -609,7 +608,11 @@ struct module { #ifdef CONFIG_DYNAMIC_DEBUG_CORE struct _ddebug_info dyndbg_info; #endif +#ifdef CONFIG_DEBUG_INFO_BTF_MODULES + KABI_USE(1, void *btf_base_data) +#else KABI_RESERVE(1) +#endif KABI_RESERVE(2) KABI_RESERVE(3) KABI_RESERVE(4) -- 2.34.1
From: Alan Maguire <alan.maguire@oracle.com> mainline inclusion from mainline-v6.11-rc1 commit e7ac331b30555cf1a0826784a346f36dbf800451 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- This will allow it to be shared with the kernel. No functional change. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240620091733.1967885-4-alan.maguire@oracle.com Conflicts: tools/lib/bpf/Build [The conflicts were due to some minor issue.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/Build | 2 +- tools/lib/bpf/btf.c | 162 ------------------------------------- tools/lib/bpf/btf_iter.c | 169 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 170 insertions(+), 163 deletions(-) create mode 100644 tools/lib/bpf/btf_iter.c diff --git a/tools/lib/bpf/Build b/tools/lib/bpf/Build index 18c006c4be71..41dae3ca2685 100644 --- a/tools/lib/bpf/Build +++ b/tools/lib/bpf/Build @@ -1,4 +1,4 @@ libbpf-y := libbpf.o bpf.o nlattr.o btf.o libbpf_errno.o str_error.o \ netlink.o bpf_prog_linfo.o libbpf_probes.o hashmap.o \ btf_dump.o ringbuf.o strset.o linker.o gen_loader.o relo_core.o \ - usdt.o zip.o elf.o btf_relocate.o + usdt.o zip.o elf.o btf_iter.o btf_relocate.o diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index cd74b6a3ec99..db8cfaeba6f7 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -4925,168 +4925,6 @@ struct btf *btf__load_module_btf(const char *module_name, struct btf *vmlinux_bt return btf__parse_split(path, vmlinux_btf); } -int btf_field_iter_init(struct btf_field_iter *it, struct btf_type *t, enum btf_field_iter_kind iter_kind) -{ - it->p = NULL; - it->m_idx = -1; - it->off_idx = 0; - it->vlen = 0; - - switch (iter_kind) { - case BTF_FIELD_ITER_IDS: - switch (btf_kind(t)) { - case BTF_KIND_UNKN: - case BTF_KIND_INT: - case BTF_KIND_FLOAT: - case BTF_KIND_ENUM: - case BTF_KIND_ENUM64: - it->desc = (struct btf_field_desc) {}; - break; - case BTF_KIND_FWD: - case BTF_KIND_CONST: - case BTF_KIND_VOLATILE: - case BTF_KIND_RESTRICT: - case BTF_KIND_PTR: - case BTF_KIND_TYPEDEF: - case BTF_KIND_FUNC: - case BTF_KIND_VAR: - case BTF_KIND_DECL_TAG: - case BTF_KIND_TYPE_TAG: - it->desc = (struct btf_field_desc) { 1, {offsetof(struct btf_type, type)} }; - break; - case BTF_KIND_ARRAY: - it->desc = (struct btf_field_desc) { - 2, {sizeof(struct btf_type) + offsetof(struct btf_array, type), - sizeof(struct btf_type) + offsetof(struct btf_array, index_type)} - }; - break; - case BTF_KIND_STRUCT: - case BTF_KIND_UNION: - it->desc = (struct btf_field_desc) { - 0, {}, - sizeof(struct btf_member), - 1, {offsetof(struct btf_member, type)} - }; - break; - case BTF_KIND_FUNC_PROTO: - it->desc = (struct btf_field_desc) { - 1, {offsetof(struct btf_type, type)}, - sizeof(struct btf_param), - 1, {offsetof(struct btf_param, type)} - }; - break; - case BTF_KIND_DATASEC: - it->desc = (struct btf_field_desc) { - 0, {}, - sizeof(struct btf_var_secinfo), - 1, {offsetof(struct btf_var_secinfo, type)} - }; - break; - default: - return -EINVAL; - } - break; - case BTF_FIELD_ITER_STRS: - switch (btf_kind(t)) { - case BTF_KIND_UNKN: - it->desc = (struct btf_field_desc) {}; - break; - case BTF_KIND_INT: - case BTF_KIND_FLOAT: - case BTF_KIND_FWD: - case BTF_KIND_ARRAY: - case BTF_KIND_CONST: - case BTF_KIND_VOLATILE: - case BTF_KIND_RESTRICT: - case BTF_KIND_PTR: - case BTF_KIND_TYPEDEF: - case BTF_KIND_FUNC: - case BTF_KIND_VAR: - case BTF_KIND_DECL_TAG: - case BTF_KIND_TYPE_TAG: - case BTF_KIND_DATASEC: - it->desc = (struct btf_field_desc) { - 1, {offsetof(struct btf_type, name_off)} - }; - break; - case BTF_KIND_ENUM: - it->desc = (struct btf_field_desc) { - 1, {offsetof(struct btf_type, name_off)}, - sizeof(struct btf_enum), - 1, {offsetof(struct btf_enum, name_off)} - }; - break; - case BTF_KIND_ENUM64: - it->desc = (struct btf_field_desc) { - 1, {offsetof(struct btf_type, name_off)}, - sizeof(struct btf_enum64), - 1, {offsetof(struct btf_enum64, name_off)} - }; - break; - case BTF_KIND_STRUCT: - case BTF_KIND_UNION: - it->desc = (struct btf_field_desc) { - 1, {offsetof(struct btf_type, name_off)}, - sizeof(struct btf_member), - 1, {offsetof(struct btf_member, name_off)} - }; - break; - case BTF_KIND_FUNC_PROTO: - it->desc = (struct btf_field_desc) { - 1, {offsetof(struct btf_type, name_off)}, - sizeof(struct btf_param), - 1, {offsetof(struct btf_param, name_off)} - }; - break; - default: - return -EINVAL; - } - break; - default: - return -EINVAL; - } - - if (it->desc.m_sz) - it->vlen = btf_vlen(t); - - it->p = t; - return 0; -} - -__u32 *btf_field_iter_next(struct btf_field_iter *it) -{ - if (!it->p) - return NULL; - - if (it->m_idx < 0) { - if (it->off_idx < it->desc.t_off_cnt) - return it->p + it->desc.t_offs[it->off_idx++]; - /* move to per-member iteration */ - it->m_idx = 0; - it->p += sizeof(struct btf_type); - it->off_idx = 0; - } - - /* if type doesn't have members, stop */ - if (it->desc.m_sz == 0) { - it->p = NULL; - return NULL; - } - - if (it->off_idx >= it->desc.m_off_cnt) { - /* exhausted this member's fields, go to the next member */ - it->m_idx++; - it->p += it->desc.m_sz; - it->off_idx = 0; - } - - if (it->m_idx < it->vlen) - return it->p + it->desc.m_offs[it->off_idx++]; - - it->p = NULL; - return NULL; -} - int btf_ext_visit_type_ids(struct btf_ext *btf_ext, type_id_visit_fn visit, void *ctx) { const struct btf_ext_info *seg; diff --git a/tools/lib/bpf/btf_iter.c b/tools/lib/bpf/btf_iter.c new file mode 100644 index 000000000000..c308aa60285d --- /dev/null +++ b/tools/lib/bpf/btf_iter.c @@ -0,0 +1,169 @@ +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) +/* Copyright (c) 2021 Facebook */ +/* Copyright (c) 2024, Oracle and/or its affiliates. */ + +#include "btf.h" +#include "libbpf_internal.h" + +int btf_field_iter_init(struct btf_field_iter *it, struct btf_type *t, + enum btf_field_iter_kind iter_kind) +{ + it->p = NULL; + it->m_idx = -1; + it->off_idx = 0; + it->vlen = 0; + + switch (iter_kind) { + case BTF_FIELD_ITER_IDS: + switch (btf_kind(t)) { + case BTF_KIND_UNKN: + case BTF_KIND_INT: + case BTF_KIND_FLOAT: + case BTF_KIND_ENUM: + case BTF_KIND_ENUM64: + it->desc = (struct btf_field_desc) {}; + break; + case BTF_KIND_FWD: + case BTF_KIND_CONST: + case BTF_KIND_VOLATILE: + case BTF_KIND_RESTRICT: + case BTF_KIND_PTR: + case BTF_KIND_TYPEDEF: + case BTF_KIND_FUNC: + case BTF_KIND_VAR: + case BTF_KIND_DECL_TAG: + case BTF_KIND_TYPE_TAG: + it->desc = (struct btf_field_desc) { 1, {offsetof(struct btf_type, type)} }; + break; + case BTF_KIND_ARRAY: + it->desc = (struct btf_field_desc) { + 2, {sizeof(struct btf_type) + offsetof(struct btf_array, type), + sizeof(struct btf_type) + offsetof(struct btf_array, index_type)} + }; + break; + case BTF_KIND_STRUCT: + case BTF_KIND_UNION: + it->desc = (struct btf_field_desc) { + 0, {}, + sizeof(struct btf_member), + 1, {offsetof(struct btf_member, type)} + }; + break; + case BTF_KIND_FUNC_PROTO: + it->desc = (struct btf_field_desc) { + 1, {offsetof(struct btf_type, type)}, + sizeof(struct btf_param), + 1, {offsetof(struct btf_param, type)} + }; + break; + case BTF_KIND_DATASEC: + it->desc = (struct btf_field_desc) { + 0, {}, + sizeof(struct btf_var_secinfo), + 1, {offsetof(struct btf_var_secinfo, type)} + }; + break; + default: + return -EINVAL; + } + break; + case BTF_FIELD_ITER_STRS: + switch (btf_kind(t)) { + case BTF_KIND_UNKN: + it->desc = (struct btf_field_desc) {}; + break; + case BTF_KIND_INT: + case BTF_KIND_FLOAT: + case BTF_KIND_FWD: + case BTF_KIND_ARRAY: + case BTF_KIND_CONST: + case BTF_KIND_VOLATILE: + case BTF_KIND_RESTRICT: + case BTF_KIND_PTR: + case BTF_KIND_TYPEDEF: + case BTF_KIND_FUNC: + case BTF_KIND_VAR: + case BTF_KIND_DECL_TAG: + case BTF_KIND_TYPE_TAG: + case BTF_KIND_DATASEC: + it->desc = (struct btf_field_desc) { + 1, {offsetof(struct btf_type, name_off)} + }; + break; + case BTF_KIND_ENUM: + it->desc = (struct btf_field_desc) { + 1, {offsetof(struct btf_type, name_off)}, + sizeof(struct btf_enum), + 1, {offsetof(struct btf_enum, name_off)} + }; + break; + case BTF_KIND_ENUM64: + it->desc = (struct btf_field_desc) { + 1, {offsetof(struct btf_type, name_off)}, + sizeof(struct btf_enum64), + 1, {offsetof(struct btf_enum64, name_off)} + }; + break; + case BTF_KIND_STRUCT: + case BTF_KIND_UNION: + it->desc = (struct btf_field_desc) { + 1, {offsetof(struct btf_type, name_off)}, + sizeof(struct btf_member), + 1, {offsetof(struct btf_member, name_off)} + }; + break; + case BTF_KIND_FUNC_PROTO: + it->desc = (struct btf_field_desc) { + 1, {offsetof(struct btf_type, name_off)}, + sizeof(struct btf_param), + 1, {offsetof(struct btf_param, name_off)} + }; + break; + default: + return -EINVAL; + } + break; + default: + return -EINVAL; + } + + if (it->desc.m_sz) + it->vlen = btf_vlen(t); + + it->p = t; + return 0; +} + +__u32 *btf_field_iter_next(struct btf_field_iter *it) +{ + if (!it->p) + return NULL; + + if (it->m_idx < 0) { + if (it->off_idx < it->desc.t_off_cnt) + return it->p + it->desc.t_offs[it->off_idx++]; + /* move to per-member iteration */ + it->m_idx = 0; + it->p += sizeof(struct btf_type); + it->off_idx = 0; + } + + /* if type doesn't have members, stop */ + if (it->desc.m_sz == 0) { + it->p = NULL; + return NULL; + } + + if (it->off_idx >= it->desc.m_off_cnt) { + /* exhausted this member's fields, go to the next member */ + it->m_idx++; + it->p += it->desc.m_sz; + it->off_idx = 0; + } + + if (it->m_idx < it->vlen) + return it->p + it->desc.m_offs[it->off_idx++]; + + it->p = NULL; + return NULL; +} -- 2.34.1
From: Alan Maguire <alan.maguire@oracle.com> mainline inclusion from mainline-v6.11-rc1 commit 8646db238997df36c6ad71a9d7e0b52ceee221b2 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Share relocation implementation with the kernel. As part of this, we also need the type/string iteration functions so also share btf_iter.c file. Relocation code in kernel and userspace is identical save for the impementation of the reparenting of split BTF to the relocated base BTF and retrieval of the BTF header from "struct btf"; these small functions need separate user-space and kernel implementations for the separate "struct btf"s they operate upon. One other wrinkle on the kernel side is we have to map .BTF.ids in modules as they were generated with the type ids used at BTF encoding time. btf_relocate() optionally returns an array mapping from old BTF ids to relocated ids, so we use that to fix up these references where needed for kfuncs. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240620091733.1967885-5-alan.maguire@oracle.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/btf.h | 64 +++++++++++++ kernel/bpf/Makefile | 8 +- kernel/bpf/btf.c | 176 ++++++++++++++++++++++++----------- tools/lib/bpf/btf_iter.c | 8 ++ tools/lib/bpf/btf_relocate.c | 23 +++++ 5 files changed, 226 insertions(+), 53 deletions(-) diff --git a/include/linux/btf.h b/include/linux/btf.h index 91c55f350357..f477f7e23ca3 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -140,6 +140,7 @@ extern const struct file_operations btf_fops; const char *btf_get_name(const struct btf *btf); void btf_get(struct btf *btf); void btf_put(struct btf *btf); +const struct btf_header *btf_header(const struct btf *btf); int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_sz); struct btf *btf_get_by_fd(int fd); int btf_get_info_by_fd(const struct btf *btf, @@ -212,8 +213,10 @@ int btf_get_fd_by_id(u32 id); u32 btf_obj_id(const struct btf *btf); bool btf_is_kernel(const struct btf *btf); bool btf_is_module(const struct btf *btf); +bool btf_is_vmlinux(const struct btf *btf); struct module *btf_try_get_module(const struct btf *btf); u32 btf_nr_types(const struct btf *btf); +struct btf *btf_base_btf(const struct btf *btf); bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s, const struct btf_member *m, u32 expected_offset, u32 expected_size); @@ -339,6 +342,11 @@ static inline u8 btf_int_offset(const struct btf_type *t) return BTF_INT_OFFSET(*(u32 *)(t + 1)); } +static inline __u8 btf_int_bits(const struct btf_type *t) +{ + return BTF_INT_BITS(*(__u32 *)(t + 1)); +} + static inline bool btf_type_is_scalar(const struct btf_type *t) { return btf_type_is_int(t) || btf_type_is_enum(t); @@ -478,6 +486,11 @@ static inline struct btf_param *btf_params(const struct btf_type *t) return (struct btf_param *)(t + 1); } +static inline struct btf_decl_tag *btf_decl_tag(const struct btf_type *t) +{ + return (struct btf_decl_tag *)(t + 1); +} + static inline int btf_id_cmp_func(const void *a, const void *b) { const int *pa = a, *pb = b; @@ -515,9 +528,38 @@ static inline const struct bpf_struct_ops_desc *bpf_struct_ops_find(struct btf * } #endif +enum btf_field_iter_kind { + BTF_FIELD_ITER_IDS, + BTF_FIELD_ITER_STRS, +}; + +struct btf_field_desc { + /* once-per-type offsets */ + int t_off_cnt, t_offs[2]; + /* member struct size, or zero, if no members */ + int m_sz; + /* repeated per-member offsets */ + int m_off_cnt, m_offs[1]; +}; + +struct btf_field_iter { + struct btf_field_desc desc; + void *p; + int m_idx; + int off_idx; + int vlen; +}; + #ifdef CONFIG_BPF_SYSCALL const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id); +void btf_set_base_btf(struct btf *btf, const struct btf *base_btf); +int btf_relocate(struct btf *btf, const struct btf *base_btf, __u32 **map_ids); +int btf_field_iter_init(struct btf_field_iter *it, struct btf_type *t, + enum btf_field_iter_kind iter_kind); +__u32 *btf_field_iter_next(struct btf_field_iter *it); + const char *btf_name_by_offset(const struct btf *btf, u32 offset); +const char *btf_str_by_offset(const struct btf *btf, u32 offset); struct btf *btf_parse_vmlinux(void); struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog); u32 *btf_kfunc_id_set_contains(const struct btf *btf, u32 kfunc_btf_id, @@ -544,6 +586,28 @@ static inline const struct btf_type *btf_type_by_id(const struct btf *btf, { return NULL; } + +static inline void btf_set_base_btf(struct btf *btf, const struct btf *base_btf) +{ +} + +static inline int btf_relocate(void *log, struct btf *btf, const struct btf *base_btf, + __u32 **map_ids) +{ + return -EOPNOTSUPP; +} + +static inline int btf_field_iter_init(struct btf_field_iter *it, struct btf_type *t, + enum btf_field_iter_kind iter_kind) +{ + return -EOPNOTSUPP; +} + +static inline __u32 *btf_field_iter_next(struct btf_field_iter *it) +{ + return NULL; +} + static inline const char *btf_name_by_offset(const struct btf *btf, u32 offset) { diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index f526b7573e97..f286815b5b7a 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -44,5 +44,11 @@ endif obj-$(CONFIG_BPF_PRELOAD) += preload/ obj-$(CONFIG_BPF_SYSCALL) += relo_core.o -$(obj)/relo_core.o: $(srctree)/tools/lib/bpf/relo_core.c FORCE +obj-$(CONFIG_BPF_SYSCALL) += btf_iter.o +obj-$(CONFIG_BPF_SYSCALL) += btf_relocate.o + +# Some source files are common to libbpf. +vpath %.c $(srctree)/kernel/bpf:$(srctree)/tools/lib/bpf + +$(obj)/%.o: %.c FORCE $(call if_changed_rule,cc_o_c) diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index ed7d054d6e68..95dea7f2ffe4 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -277,6 +277,7 @@ struct btf { u32 start_str_off; /* first string offset (0 for base BTF) */ char name[MODULE_NAME_LEN]; bool kernel_btf; + __u32 *base_id_map; /* map from distilled base BTF -> vmlinux BTF ids */ }; enum verifier_phase { @@ -533,6 +534,11 @@ static bool btf_type_is_decl_tag_target(const struct btf_type *t) btf_type_is_var(t) || btf_type_is_typedef(t); } +bool btf_is_vmlinux(const struct btf *btf) +{ + return btf->kernel_btf && !btf->base_btf; +} + u32 btf_nr_types(const struct btf *btf) { u32 total = 0; @@ -775,7 +781,7 @@ static bool __btf_name_char_ok(char c, bool first) return true; } -static const char *btf_str_by_offset(const struct btf *btf, u32 offset) +const char *btf_str_by_offset(const struct btf *btf, u32 offset) { while (offset < btf->start_str_off) btf = btf->base_btf; @@ -1668,14 +1674,8 @@ static void btf_free_kfunc_set_tab(struct btf *btf) if (!tab) return; - /* For module BTF, we directly assign the sets being registered, so - * there is nothing to free except kfunc_set_tab. - */ - if (btf_is_module(btf)) - goto free_tab; for (hook = 0; hook < ARRAY_SIZE(tab->sets); hook++) kfree(tab->sets[hook]); -free_tab: kfree(tab); btf->kfunc_set_tab = NULL; } @@ -1733,7 +1733,12 @@ static void btf_free(struct btf *btf) kvfree(btf->types); kvfree(btf->resolved_sizes); kvfree(btf->resolved_ids); - kvfree(btf->data); + /* vmlinux does not allocate btf->data, it simply points it at + * __start_BTF. + */ + if (!btf_is_vmlinux(btf)) + kvfree(btf->data); + kvfree(btf->base_id_map); kfree(btf); } @@ -1762,6 +1767,23 @@ void btf_put(struct btf *btf) } } +struct btf *btf_base_btf(const struct btf *btf) +{ + return btf->base_btf; +} + +const struct btf_header *btf_header(const struct btf *btf) +{ + return &btf->hdr; +} + +void btf_set_base_btf(struct btf *btf, const struct btf *base_btf) +{ + btf->base_btf = (struct btf *)base_btf; + btf->start_id = btf_nr_types(base_btf); + btf->start_str_off = base_btf->hdr.str_len; +} + static int env_resolve_init(struct btf_verifier_env *env) { struct btf *btf = env->btf; @@ -5759,23 +5781,15 @@ int get_kern_ctx_btf_id(struct bpf_verifier_log *log, enum bpf_prog_type prog_ty BTF_ID_LIST(bpf_ctx_convert_btf_id) BTF_ID(struct, bpf_ctx_convert) -struct btf *btf_parse_vmlinux(void) +static struct btf *btf_parse_base(struct btf_verifier_env *env, const char *name, + void *data, unsigned int data_size) { - struct btf_verifier_env *env = NULL; - struct bpf_verifier_log *log; struct btf *btf = NULL; int err; if (!IS_ENABLED(CONFIG_DEBUG_INFO_BTF)) return ERR_PTR(-ENOENT); - env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN); - if (!env) - return ERR_PTR(-ENOMEM); - - log = &env->log; - log->level = BPF_LOG_KERNEL; - btf = kzalloc(sizeof(*btf), GFP_KERNEL | __GFP_NOWARN); if (!btf) { err = -ENOMEM; @@ -5783,10 +5797,10 @@ struct btf *btf_parse_vmlinux(void) } env->btf = btf; - btf->data = __start_BTF; - btf->data_size = __stop_BTF - __start_BTF; + btf->data = data; + btf->data_size = data_size; btf->kernel_btf = true; - snprintf(btf->name, sizeof(btf->name), "vmlinux"); + snprintf(btf->name, sizeof(btf->name), "%s", name); err = btf_parse_hdr(env); if (err) @@ -5806,20 +5820,11 @@ struct btf *btf_parse_vmlinux(void) if (err) goto errout; - /* btf_parse_vmlinux() runs under bpf_verifier_lock */ - bpf_ctx_convert.t = btf_type_by_id(btf, bpf_ctx_convert_btf_id[0]); - refcount_set(&btf->refcnt, 1); - err = btf_alloc_id(btf); - if (err) - goto errout; - - btf_verifier_env_free(env); return btf; errout: - btf_verifier_env_free(env); if (btf) { kvfree(btf->types); kfree(btf); @@ -5827,19 +5832,61 @@ struct btf *btf_parse_vmlinux(void) return ERR_PTR(err); } +struct btf *btf_parse_vmlinux(void) +{ + struct btf_verifier_env *env = NULL; + struct bpf_verifier_log *log; + struct btf *btf; + int err; + + env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN); + if (!env) + return ERR_PTR(-ENOMEM); + + log = &env->log; + log->level = BPF_LOG_KERNEL; + btf = btf_parse_base(env, "vmlinux", __start_BTF, __stop_BTF - __start_BTF); + if (IS_ERR(btf)) + goto err_out; + + /* btf_parse_vmlinux() runs under bpf_verifier_lock */ + bpf_ctx_convert.t = btf_type_by_id(btf, bpf_ctx_convert_btf_id[0]); + err = btf_alloc_id(btf); + if (err) { + btf_free(btf); + btf = ERR_PTR(err); + } +err_out: + btf_verifier_env_free(env); + return btf; +} + #ifdef CONFIG_DEBUG_INFO_BTF_MODULES -static struct btf *btf_parse_module(const char *module_name, const void *data, unsigned int data_size) +/* If .BTF_ids section was created with distilled base BTF, both base and + * split BTF ids will need to be mapped to actual base/split ids for + * BTF now that it has been relocated. + */ +static __u32 btf_relocate_id(const struct btf *btf, __u32 id) +{ + if (!btf->base_btf || !btf->base_id_map) + return id; + return btf->base_id_map[id]; +} + +static struct btf *btf_parse_module(const char *module_name, const void *data, + unsigned int data_size, void *base_data, + unsigned int base_data_size) { + struct btf *btf = NULL, *vmlinux_btf, *base_btf = NULL; struct btf_verifier_env *env = NULL; struct bpf_verifier_log *log; - struct btf *btf = NULL, *base_btf; - int err; + int err = 0; - base_btf = bpf_get_btf_vmlinux(); - if (IS_ERR(base_btf)) - return base_btf; - if (!base_btf) + vmlinux_btf = bpf_get_btf_vmlinux(); + if (IS_ERR(vmlinux_btf)) + return vmlinux_btf; + if (!vmlinux_btf) return ERR_PTR(-EINVAL); env = kzalloc(sizeof(*env), GFP_KERNEL | __GFP_NOWARN); @@ -5849,6 +5896,16 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u log = &env->log; log->level = BPF_LOG_KERNEL; + if (base_data) { + base_btf = btf_parse_base(env, ".BTF.base", base_data, base_data_size); + if (IS_ERR(base_btf)) { + err = PTR_ERR(base_btf); + goto errout; + } + } else { + base_btf = vmlinux_btf; + } + btf = kzalloc(sizeof(*btf), GFP_KERNEL | __GFP_NOWARN); if (!btf) { err = -ENOMEM; @@ -5888,12 +5945,22 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, u if (err) goto errout; + if (base_btf != vmlinux_btf) { + err = btf_relocate(btf, vmlinux_btf, &btf->base_id_map); + if (err) + goto errout; + btf_free(base_btf); + base_btf = vmlinux_btf; + } + btf_verifier_env_free(env); refcount_set(&btf->refcnt, 1); return btf; errout: btf_verifier_env_free(env); + if (base_btf != vmlinux_btf) + btf_free(base_btf); if (btf) { kvfree(btf->data); kvfree(btf->types); @@ -7682,7 +7749,8 @@ static int btf_module_notify(struct notifier_block *nb, unsigned long op, err = -ENOMEM; goto out; } - btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size); + btf = btf_parse_module(mod->name, mod->btf_data, mod->btf_data_size, + mod->btf_base_data, mod->btf_base_data_size); if (IS_ERR(btf)) { kfree(btf_mod); if (!IS_ENABLED(CONFIG_MODULE_ALLOW_BTF_MISMATCH)) { @@ -8013,7 +8081,7 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook, bool add_filter = !!kset->filter; struct btf_kfunc_set_tab *tab; struct btf_id_set8 *set; - u32 set_cnt; + u32 set_cnt, i; int ret; if (hook >= BTF_KFUNC_HOOK_MAX) { @@ -8059,21 +8127,15 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook, goto end; } - /* We don't need to allocate, concatenate, and sort module sets, because - * only one is allowed per hook. Hence, we can directly assign the - * pointer and return. - */ - if (!vmlinux_set) { - tab->sets[hook] = add_set; - goto do_add_filter; - } - /* In case of vmlinux sets, there may be more than one set being * registered per hook. To create a unified set, we allocate a new set * and concatenate all individual sets being registered. While each set * is individually sorted, they may become unsorted when concatenated, * hence re-sorting the final set again is required to make binary * searching the set using btf_id_set8_contains function work. + * + * For module sets, we need to allocate as we may need to relocate + * BTF ids. */ set_cnt = set ? set->cnt : 0; @@ -8103,11 +8165,14 @@ static int btf_populate_kfunc_set(struct btf *btf, enum btf_kfunc_hook hook, /* Concatenate the two sets */ memcpy(set->pairs + set->cnt, add_set->pairs, add_set->cnt * sizeof(set->pairs[0])); + /* Now that the set is copied, update with relocated BTF ids */ + for (i = set->cnt; i < set->cnt + add_set->cnt; i++) + set->pairs[i].id = btf_relocate_id(btf, set->pairs[i].id); + set->cnt += add_set->cnt; sort(set->pairs, set->cnt, sizeof(set->pairs[0]), btf_id_cmp_func, NULL); -do_add_filter: if (add_filter) { hook_filter = &tab->hook_filters[hook]; hook_filter->filters[hook_filter->nr_filters++] = kset->filter; @@ -8238,7 +8303,7 @@ static int __register_btf_kfunc_id_set(enum btf_kfunc_hook hook, return PTR_ERR(btf); for (i = 0; i < kset->set->cnt; i++) { - ret = btf_check_kfunc_protos(btf, kset->set->pairs[i].id, + ret = btf_check_kfunc_protos(btf, btf_relocate_id(btf, kset->set->pairs[i].id), kset->set->pairs[i].flags); if (ret) goto err_out; @@ -8302,7 +8367,7 @@ static int btf_check_dtor_kfuncs(struct btf *btf, const struct btf_id_dtor_kfunc u32 nr_args, i; for (i = 0; i < cnt; i++) { - dtor_btf_id = dtors[i].kfunc_btf_id; + dtor_btf_id = btf_relocate_id(btf, dtors[i].kfunc_btf_id); dtor_func = btf_type_by_id(btf, dtor_btf_id); if (!dtor_func || !btf_type_is_func(dtor_func)) @@ -8337,7 +8402,7 @@ int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_c { struct btf_id_dtor_kfunc_tab *tab; struct btf *btf; - u32 tab_cnt; + u32 tab_cnt, i; int ret; btf = btf_get_module_btf(owner); @@ -8388,6 +8453,13 @@ int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_c btf->dtor_kfunc_tab = tab; memcpy(tab->dtors + tab->cnt, dtors, add_cnt * sizeof(tab->dtors[0])); + + /* remap BTF ids based on BTF relocation (if any) */ + for (i = tab_cnt; i < tab_cnt + add_cnt; i++) { + tab->dtors[i].btf_id = btf_relocate_id(btf, tab->dtors[i].btf_id); + tab->dtors[i].kfunc_btf_id = btf_relocate_id(btf, tab->dtors[i].kfunc_btf_id); + } + tab->cnt += add_cnt; sort(tab->dtors, tab->cnt, sizeof(tab->dtors[0]), btf_id_cmp_func, NULL); diff --git a/tools/lib/bpf/btf_iter.c b/tools/lib/bpf/btf_iter.c index c308aa60285d..9a6c822c2294 100644 --- a/tools/lib/bpf/btf_iter.c +++ b/tools/lib/bpf/btf_iter.c @@ -2,8 +2,16 @@ /* Copyright (c) 2021 Facebook */ /* Copyright (c) 2024, Oracle and/or its affiliates. */ +#ifdef __KERNEL__ +#include <linux/bpf.h> +#include <linux/btf.h> + +#define btf_var_secinfos(t) (struct btf_var_secinfo *)btf_type_var_secinfo(t) + +#else #include "btf.h" #include "libbpf_internal.h" +#endif int btf_field_iter_init(struct btf_field_iter *it, struct btf_type *t, enum btf_field_iter_kind iter_kind) diff --git a/tools/lib/bpf/btf_relocate.c b/tools/lib/bpf/btf_relocate.c index 23a41fb03e0d..2281dbbafa11 100644 --- a/tools/lib/bpf/btf_relocate.c +++ b/tools/lib/bpf/btf_relocate.c @@ -5,11 +5,34 @@ #define _GNU_SOURCE #endif +#ifdef __KERNEL__ +#include <linux/bpf.h> +#include <linux/bsearch.h> +#include <linux/btf.h> +#include <linux/sort.h> +#include <linux/string.h> +#include <linux/bpf_verifier.h> + +#define btf_type_by_id (struct btf_type *)btf_type_by_id +#define btf__type_cnt btf_nr_types +#define btf__base_btf btf_base_btf +#define btf__name_by_offset btf_name_by_offset +#define btf__str_by_offset btf_str_by_offset +#define btf_kflag btf_type_kflag + +#define calloc(nmemb, sz) kvcalloc(nmemb, sz, GFP_KERNEL | __GFP_NOWARN) +#define free(ptr) kvfree(ptr) +#define qsort(base, num, sz, cmp) sort(base, num, sz, cmp, NULL) + +#else + #include "btf.h" #include "bpf.h" #include "libbpf.h" #include "libbpf_internal.h" +#endif /* __KERNEL__ */ + struct btf; struct btf_relocate { -- 2.34.1
From: Alan Maguire <alan.maguire@oracle.com> mainline inclusion from mainline-v6.11-rc1 commit 46fb0b62ea29c0dbcb3e44f1d67aafe79bc6e045 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Support creation of module BTF along with distilled base BTF; the latter is stored in a .BTF.base ELF section and supplements split BTF references to base BTF with information about base types, allowing for later relocation of split BTF with a (possibly changed) base. resolve_btfids detects the presence of a .BTF.base section and will use it instead of the base BTF it is passed in BTF id resolution. Modules will be built with a distilled .BTF.base section for external module build, i.e. make -C. -M=path2/module ...while in-tree module build as part of a normal kernel build will not generate distilled base BTF; this is because in-tree modules change with the kernel and do not require BTF relocation for the running vmlinux. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240620091733.1967885-6-alan.maguire@oracle.com Conflicts: scripts/Makefile.btf [The conflicts were due to not merge commit ebb79e96f1ea.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- scripts/Makefile.btf | 5 +++++ scripts/Makefile.modfinal | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/scripts/Makefile.btf b/scripts/Makefile.btf index bca8a8f26ea4..191b4903e864 100644 --- a/scripts/Makefile.btf +++ b/scripts/Makefile.btf @@ -21,8 +21,13 @@ else # Switch to using --btf_features for v1.26 and later. pahole-flags-$(call test-ge, $(pahole-ver), 126) = -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func +ifneq ($(KBUILD_EXTMOD),) +module-pahole-flags-$(call test-ge, $(pahole-ver), 126) += --btf_features=distilled_base +endif + endif pahole-flags-$(CONFIG_PAHOLE_HAS_LANG_EXCLUDE) += --lang_exclude=rust export PAHOLE_FLAGS := $(pahole-flags-y) +export MODULE_PAHOLE_FLAGS := $(module-pahole-flags-y) diff --git a/scripts/Makefile.modfinal b/scripts/Makefile.modfinal index 1979913aff68..4e07b3501568 100644 --- a/scripts/Makefile.modfinal +++ b/scripts/Makefile.modfinal @@ -42,7 +42,7 @@ quiet_cmd_btf_ko = BTF [M] $@ if [ ! -f vmlinux ]; then \ printf "Skipping BTF generation for %s due to unavailability of vmlinux\n" $@ 1>&2; \ else \ - LLVM_OBJCOPY="$(OBJCOPY)" $(PAHOLE) -J $(PAHOLE_FLAGS) --btf_base vmlinux $@; \ + LLVM_OBJCOPY="$(OBJCOPY)" $(PAHOLE) -J $(PAHOLE_FLAGS) $(MODULE_PAHOLE_FLAGS) --btf_base vmlinux $@; \ $(RESOLVE_BTFIDS) -b vmlinux $@; \ fi; -- 2.34.1
From: Alan Maguire <alan.maguire@oracle.com> mainline inclusion from mainline-v6.11-rc1 commit 47a8cf0c5b3f6769b9d558301735c75119a0a165 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- add simple kfuncs to create/destroy a context type to bpf_testmod, register them and add a kfunc_call test to use them. This provides test coverage for registration of dtor kfuncs from modules. By transferring the context pointer to a map value as a __kptr we also trigger the map-based dtor cleanup logic, improving test coverage. Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240620091733.1967885-7-alan.maguire@oracle.com Conflicts: tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h [The conflicts were due to some minor issue.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/bpf_testmod/bpf_testmod.c | 46 +++++++++++++++++++ .../bpf/bpf_testmod/bpf_testmod_kfunc.h | 9 ++++ .../selftests/bpf/prog_tests/kfunc_call.c | 1 + .../selftests/bpf/progs/kfunc_call_test.c | 37 +++++++++++++++ 4 files changed, 93 insertions(+) diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index 2c5c64950781..700bba39a22c 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -138,6 +138,37 @@ __bpf_kfunc void bpf_iter_testmod_seq_destroy(struct bpf_iter_testmod_seq *it) it->cnt = 0; } +__bpf_kfunc struct bpf_testmod_ctx * +bpf_testmod_ctx_create(int *err) +{ + struct bpf_testmod_ctx *ctx; + + ctx = kzalloc(sizeof(*ctx), GFP_ATOMIC); + if (!ctx) { + *err = -ENOMEM; + return NULL; + } + refcount_set(&ctx->usage, 1); + + return ctx; +} + +static void testmod_free_cb(struct rcu_head *head) +{ + struct bpf_testmod_ctx *ctx; + + ctx = container_of(head, struct bpf_testmod_ctx, rcu); + kfree(ctx); +} + +__bpf_kfunc void bpf_testmod_ctx_release(struct bpf_testmod_ctx *ctx) +{ + if (!ctx) + return; + if (refcount_dec_and_test(&ctx->usage)) + call_rcu(&ctx->rcu, testmod_free_cb); +} + struct bpf_testmod_btf_type_tag_1 { int a; }; @@ -343,8 +374,14 @@ BTF_KFUNCS_START(bpf_testmod_common_kfunc_ids) BTF_ID_FLAGS(func, bpf_iter_testmod_seq_new, KF_ITER_NEW) BTF_ID_FLAGS(func, bpf_iter_testmod_seq_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_testmod_seq_destroy, KF_ITER_DESTROY) +BTF_ID_FLAGS(func, bpf_testmod_ctx_create, KF_ACQUIRE | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_testmod_ctx_release, KF_RELEASE) BTF_KFUNCS_END(bpf_testmod_common_kfunc_ids) +BTF_ID_LIST(bpf_testmod_dtor_ids) +BTF_ID(struct, bpf_testmod_ctx) +BTF_ID(func, bpf_testmod_ctx_release) + static const struct btf_kfunc_id_set bpf_testmod_common_kfunc_set = { .owner = THIS_MODULE, .set = &bpf_testmod_common_kfunc_ids, @@ -608,6 +645,12 @@ extern int bpf_fentry_test1(int a); static int bpf_testmod_init(void) { + const struct btf_id_dtor_kfunc bpf_testmod_dtors[] = { + { + .btf_id = bpf_testmod_dtor_ids[0], + .kfunc_btf_id = bpf_testmod_dtor_ids[1] + }, + }; int ret; ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &bpf_testmod_common_kfunc_set); @@ -615,6 +658,9 @@ static int bpf_testmod_init(void) ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_testmod_kfunc_set); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SYSCALL, &bpf_testmod_kfunc_set); ret = ret ?: register_bpf_struct_ops(&bpf_bpf_testmod_ops, bpf_testmod_ops); + ret = ret ?: register_btf_id_dtor_kfuncs(bpf_testmod_dtors, + ARRAY_SIZE(bpf_testmod_dtors), + THIS_MODULE); if (ret < 0) return ret; if (bpf_fentry_test1(0) < 0) diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h index f5c5b1375c24..9e0e6821539a 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod_kfunc.h @@ -64,6 +64,11 @@ struct prog_test_fail3 { char arr2[]; }; +struct bpf_testmod_ctx { + struct callback_head rcu; + refcount_t usage; +}; + struct prog_test_ref_kfunc * bpf_kfunc_call_test_acquire(unsigned long *scalar_ptr) __ksym; void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p) __ksym; @@ -104,4 +109,8 @@ void bpf_kfunc_call_test_fail1(struct prog_test_fail1 *p); void bpf_kfunc_call_test_fail2(struct prog_test_fail2 *p); void bpf_kfunc_call_test_fail3(struct prog_test_fail3 *p); void bpf_kfunc_call_test_mem_len_fail1(void *mem, int len); + +struct bpf_testmod_ctx *bpf_testmod_ctx_create(int *err) __ksym; +void bpf_testmod_ctx_release(struct bpf_testmod_ctx *ctx) __ksym; + #endif /* _BPF_TESTMOD_KFUNC_H */ diff --git a/tools/testing/selftests/bpf/prog_tests/kfunc_call.c b/tools/testing/selftests/bpf/prog_tests/kfunc_call.c index 2eb71559713c..5b743212292f 100644 --- a/tools/testing/selftests/bpf/prog_tests/kfunc_call.c +++ b/tools/testing/selftests/bpf/prog_tests/kfunc_call.c @@ -78,6 +78,7 @@ static struct kfunc_test_params kfunc_tests[] = { SYSCALL_TEST(kfunc_syscall_test, 0), SYSCALL_NULL_CTX_TEST(kfunc_syscall_test_null, 0), TC_TEST(kfunc_call_test_static_unused_arg, 0), + TC_TEST(kfunc_call_ctx, 0), }; struct syscall_test_args { diff --git a/tools/testing/selftests/bpf/progs/kfunc_call_test.c b/tools/testing/selftests/bpf/progs/kfunc_call_test.c index cf68d1e48a0f..f502f755f567 100644 --- a/tools/testing/selftests/bpf/progs/kfunc_call_test.c +++ b/tools/testing/selftests/bpf/progs/kfunc_call_test.c @@ -177,4 +177,41 @@ int kfunc_call_test_static_unused_arg(struct __sk_buff *skb) return actual != expected ? -1 : 0; } +struct ctx_val { + struct bpf_testmod_ctx __kptr *ctx; +}; + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 1); + __type(key, int); + __type(value, struct ctx_val); +} ctx_map SEC(".maps"); + +SEC("tc") +int kfunc_call_ctx(struct __sk_buff *skb) +{ + struct bpf_testmod_ctx *ctx; + int err = 0; + + ctx = bpf_testmod_ctx_create(&err); + if (!ctx && !err) + err = -1; + if (ctx) { + int key = 0; + struct ctx_val *ctx_val = bpf_map_lookup_elem(&ctx_map, &key); + + /* Transfer ctx to map to be freed via implicit dtor call + * on cleanup. + */ + if (ctx_val) + ctx = bpf_kptr_xchg(&ctx_val->ctx, ctx); + if (ctx) { + bpf_testmod_ctx_release(ctx); + err = -1; + } + } + return err; +} + char _license[] SEC("license") = "GPL"; -- 2.34.1
From: Alan Maguire <alan.maguire@oracle.com> mainline inclusion from mainline-v6.11-rc1 commit 5a532459aa919d055d822d8db4ea2c5c8d511568 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Kernel test robot reports that kernel build fails with resilient split BTF changes. Examining the associated config and code we see that btf_relocate_id() is defined under CONFIG_DEBUG_INFO_BTF_MODULES. Moving it outside the #ifdef solves the issue. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202406221742.d2srFLVI-lkp@intel.com/ Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/r/20240623135224.27981-1-alan.maguire@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/btf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 95dea7f2ffe4..3fe2b93b50bc 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -5861,8 +5861,6 @@ struct btf *btf_parse_vmlinux(void) return btf; } -#ifdef CONFIG_DEBUG_INFO_BTF_MODULES - /* If .BTF_ids section was created with distilled base BTF, both base and * split BTF ids will need to be mapped to actual base/split ids for * BTF now that it has been relocated. @@ -5874,6 +5872,8 @@ static __u32 btf_relocate_id(const struct btf *btf, __u32 id) return btf->base_id_map[id]; } +#ifdef CONFIG_DEBUG_INFO_BTF_MODULES + static struct btf *btf_parse_module(const char *module_name, const void *data, unsigned int data_size, void *base_data, unsigned int base_data_size) -- 2.34.1
From: Alan Maguire <alan.maguire@oracle.com> mainline inclusion from mainline-v6.11-rc1 commit 5b747c23f17d791e08fdf4baa7e14b704625518c category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Coverity points out that after calling btf__new_empty_split() the wrong value is checked for error. Fixes: 58e185a0dc35 ("libbpf: Add btf__distill_base() creating split BTF with distilled base BTF") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240629100058.2866763-1-alan.maguire@oracle.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/btf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index db8cfaeba6f7..ed64b5c207b8 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -5263,7 +5263,7 @@ int btf__distill_base(const struct btf *src_btf, struct btf **new_base_btf, * BTF available. */ new_split = btf__new_empty_split(new_base); - if (!new_split_btf) { + if (!new_split) { err = -errno; goto done; } -- 2.34.1
From: Pu Lehui <pulehui@huawei.com> mainline inclusion from mainline-v6.11-rc1 commit 9f1e16fb1fc9826001c69e0551d51fbbcd2d74e9 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- We get the size of the trampoline image during the dry run phase and allocate memory based on that size. The allocated image will then be populated with instructions during the real patch phase. But after commit 26ef208c209a ("bpf: Use arch_bpf_trampoline_size"), the `im` argument is inconsistent in the dry run and real patch phase. This may cause emit_imm in RV64 to generate a different number of instructions when generating the 'im' address, potentially causing out-of-bounds issues. Let's emit the maximum number of instructions for the "im" address during dry run to fix this problem. Fixes: 26ef208c209a ("bpf: Use arch_bpf_trampoline_size") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240622030437.3973492-3-pulehui@huaweicloud.com Conflicts: arch/riscv/net/bpf_jit_comp64.c [Fix conflict because of context diff.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- arch/riscv/net/bpf_jit_comp64.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c index 4ae7e6e7c061..a509fab76c84 100644 --- a/arch/riscv/net/bpf_jit_comp64.c +++ b/arch/riscv/net/bpf_jit_comp64.c @@ -15,6 +15,8 @@ #define RV_FENTRY_NINSNS 2 #define RV_FENTRY_NBYTES (RV_FENTRY_NINSNS * 4) +/* imm that allows emit_imm to emit max count insns */ +#define RV_MAX_COUNT_IMM 0x7FFF7FF7FF7FF7FF #define RV_REG_TCC RV_REG_A6 #define RV_REG_TCC_SAVED RV_REG_S6 /* Store A6 in S6 if program do calls */ @@ -922,7 +924,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, orig_call += RV_FENTRY_NINSNS * 4; if (flags & BPF_TRAMP_F_CALL_ORIG) { - emit_imm(RV_REG_A0, (const s64)im, ctx); + emit_imm(RV_REG_A0, ctx->insns ? (const s64)im : RV_MAX_COUNT_IMM, ctx); ret = emit_call((const u64)__bpf_tramp_enter, true, ctx); if (ret) return ret; @@ -983,7 +985,7 @@ static int __arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, if (flags & BPF_TRAMP_F_CALL_ORIG) { im->ip_epilogue = ctx->insns + ctx->ninsns; - emit_imm(RV_REG_A0, (const s64)im, ctx); + emit_imm(RV_REG_A0, ctx->insns ? (const s64)im : RV_MAX_COUNT_IMM, ctx); ret = emit_call((const u64)__bpf_tramp_exit, true, ctx); if (ret) goto out; @@ -1052,6 +1054,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, { int ret; struct rv_jit_context ctx; + u32 size = image_end - image; ctx.ninsns = 0; /* @@ -1065,11 +1068,16 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, ctx.ro_insns = image; ret = __arch_prepare_bpf_trampoline(im, m, tlinks, func_addr, flags, &ctx); if (ret < 0) - return ret; + goto out; - bpf_flush_icache(ctx.insns, ctx.insns + ctx.ninsns); + if (WARN_ON(size < ninsns_rvoff(ctx.ninsns))) { + ret = -E2BIG; + goto out; + } - return ninsns_rvoff(ret); + bpf_flush_icache(image, image_end); +out: + return ret < 0 ? ret : size; } int bpf_jit_emit_insn(const struct bpf_insn *insn, struct rv_jit_context *ctx, -- 2.34.1
From: Alan Maguire <alan.maguire@oracle.com> mainline inclusion from mainline-v6.12-rc1 commit 4a4c013d3385b0db85dc361203dc42ff048b6fd6 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- License should be // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) ...as with other libbpf files. Fixes: 19e00c897d50 ("libbpf: Split BTF relocation") Reported-by: Neill Kapron <nkapron@google.com> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20240810093504.2111134-1-alan.maguire@oracle.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/btf_relocate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/lib/bpf/btf_relocate.c b/tools/lib/bpf/btf_relocate.c index 2281dbbafa11..6c9a16115bf0 100644 --- a/tools/lib/bpf/btf_relocate.c +++ b/tools/lib/bpf/btf_relocate.c @@ -1,4 +1,4 @@ -// SPDX-License-Identifier: GPL-2.0 +// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) /* Copyright (c) 2024, Oracle and/or its affiliates. */ #ifndef _GNU_SOURCE -- 2.34.1
From: Andrii Nakryiko <andrii@kernel.org> mainline inclusion from mainline-v6.12-rc1 commit 496ddd19a0fad22f250fc7a7b7a8000155418934 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Verifier enforces that all iterator structs are named `bpf_iter_<name>` and that whenever iterator is passed to a kfunc it's passed as a valid PTR -> STRUCT chain (with potentially const modifiers in between). We'll need this check for upcoming changes, so instead of duplicating the logic, extract it into a helper function. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240808232230.2848712-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/btf.h | 5 +++++ kernel/bpf/btf.c | 50 ++++++++++++++++++++++++++++++++------------- 2 files changed, 41 insertions(+), 14 deletions(-) diff --git a/include/linux/btf.h b/include/linux/btf.h index f477f7e23ca3..3dad546d37aa 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -580,6 +580,7 @@ btf_get_prog_ctx_type(struct bpf_verifier_log *log, const struct btf *btf, int get_kern_ctx_btf_id(struct bpf_verifier_log *log, enum bpf_prog_type prog_type); bool btf_types_are_same(const struct btf *btf1, u32 id1, const struct btf *btf2, u32 id2); +int btf_check_iter_arg(struct btf *btf, const struct btf_type *func, int arg_idx); #else static inline const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id) @@ -654,6 +655,10 @@ static inline bool btf_types_are_same(const struct btf *btf1, u32 id1, { return false; } +static inline int btf_check_iter_arg(struct btf *btf, const struct btf_type *func, int arg_idx) +{ + return -EOPNOTSUPP; +} #endif static inline bool btf_type_is_struct_ptr(struct btf *btf, const struct btf_type *t) diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 3fe2b93b50bc..a3918dd19331 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -7968,15 +7968,44 @@ BTF_ID_LIST_GLOBAL(btf_tracing_ids, MAX_BTF_TRACING_TYPE) BTF_TRACING_TYPE_xxx #undef BTF_TRACING_TYPE +/* Validate well-formedness of iter argument type. + * On success, return positive BTF ID of iter state's STRUCT type. + * On error, negative error is returned. + */ +int btf_check_iter_arg(struct btf *btf, const struct btf_type *func, int arg_idx) +{ + const struct btf_param *arg; + const struct btf_type *t; + const char *name; + int btf_id; + + if (btf_type_vlen(func) <= arg_idx) + return -EINVAL; + + arg = &btf_params(func)[arg_idx]; + t = btf_type_skip_modifiers(btf, arg->type, NULL); + if (!t || !btf_type_is_ptr(t)) + return -EINVAL; + t = btf_type_skip_modifiers(btf, t->type, &btf_id); + if (!t || !__btf_type_is_struct(t)) + return -EINVAL; + + name = btf_name_by_offset(btf, t->name_off); + if (!name || strncmp(name, ITER_PREFIX, sizeof(ITER_PREFIX) - 1)) + return -EINVAL; + + return btf_id; +} + static int btf_check_iter_kfuncs(struct btf *btf, const char *func_name, const struct btf_type *func, u32 func_flags) { u32 flags = func_flags & (KF_ITER_NEW | KF_ITER_NEXT | KF_ITER_DESTROY); - const char *name, *sfx, *iter_name; - const struct btf_param *arg; + const char *sfx, *iter_name; const struct btf_type *t; char exp_name[128]; u32 nr_args; + int btf_id; /* exactly one of KF_ITER_{NEW,NEXT,DESTROY} can be set */ if (!flags || (flags & (flags - 1))) @@ -7987,28 +8016,21 @@ static int btf_check_iter_kfuncs(struct btf *btf, const char *func_name, if (nr_args < 1) return -EINVAL; - arg = &btf_params(func)[0]; - t = btf_type_skip_modifiers(btf, arg->type, NULL); - if (!t || !btf_type_is_ptr(t)) - return -EINVAL; - t = btf_type_skip_modifiers(btf, t->type, NULL); - if (!t || !__btf_type_is_struct(t)) - return -EINVAL; - - name = btf_name_by_offset(btf, t->name_off); - if (!name || strncmp(name, ITER_PREFIX, sizeof(ITER_PREFIX) - 1)) - return -EINVAL; + btf_id = btf_check_iter_arg(btf, func, 0); + if (btf_id < 0) + return btf_id; /* sizeof(struct bpf_iter_<type>) should be a multiple of 8 to * fit nicely in stack slots */ + t = btf_type_by_id(btf, btf_id); if (t->size == 0 || (t->size % 8)) return -EINVAL; /* validate bpf_iter_<type>_{new,next,destroy}(struct bpf_iter_<type> *) * naming pattern */ - iter_name = name + sizeof(ITER_PREFIX) - 1; + iter_name = btf_name_by_offset(btf, t->name_off) + sizeof(ITER_PREFIX) - 1; if (flags & KF_ITER_NEW) sfx = "new"; else if (flags & KF_ITER_NEXT) -- 2.34.1
From: Andrii Nakryiko <andrii@kernel.org> mainline inclusion from mainline-v6.12-rc1 commit baebe9aaba1e59e34cd1fe6455bb4c3029ad3bc1 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- There are potentially useful cases where a specific iterator type might need to be passed into some kfunc. So, in addition to existing bpf_iter_<type>_{new,next,destroy}() kfuncs, allow to pass iterator pointer to any kfunc. We employ "__iter" naming suffix for arguments that are meant to accept iterators. We also enforce that they accept PTR -> STRUCT btf_iter_<type> type chain and point to a valid initialized on-the-stack iterator state. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240808232230.2848712-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/verifier.c | 35 ++++++++++++++++++++++++----------- 1 file changed, 24 insertions(+), 11 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 5ba537821d49..1f188439c712 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -7839,12 +7839,17 @@ static bool is_iter_destroy_kfunc(struct bpf_kfunc_call_arg_meta *meta) return meta->kfunc_flags & KF_ITER_DESTROY; } -static bool is_kfunc_arg_iter(struct bpf_kfunc_call_arg_meta *meta, int arg) +static bool is_kfunc_arg_iter(struct bpf_kfunc_call_arg_meta *meta, int arg_idx, + const struct btf_param *arg) { /* btf_check_iter_kfuncs() guarantees that first argument of any iter * kfunc is iter state pointer */ - return arg == 0 && is_iter_kfunc(meta); + if (is_iter_kfunc(meta)) + return arg_idx == 0; + + /* iter passed as an argument to a generic kfunc */ + return btf_param_match_suffix(meta->btf, arg, "__iter"); } static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_idx, @@ -7852,14 +7857,20 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id { struct bpf_reg_state *regs = cur_regs(env), *reg = ®s[regno]; const struct btf_type *t; - const struct btf_param *arg; - int spi, err, i, nr_slots; - u32 btf_id; + int spi, err, i, nr_slots, btf_id; - /* btf_check_iter_kfuncs() ensures we don't need to validate anything here */ - arg = &btf_params(meta->func_proto)[0]; - t = btf_type_skip_modifiers(meta->btf, arg->type, NULL); /* PTR */ - t = btf_type_skip_modifiers(meta->btf, t->type, &btf_id); /* STRUCT */ + /* For iter_{new,next,destroy} functions, btf_check_iter_kfuncs() + * ensures struct convention, so we wouldn't need to do any BTF + * validation here. But given iter state can be passed as a parameter + * to any kfunc, if arg has "__iter" suffix, we need to be a bit more + * conservative here. + */ + btf_id = btf_check_iter_arg(meta->btf, meta->func_proto, regno - 1); + if (btf_id < 0) { + verbose(env, "expected valid iter pointer as arg #%d\n", regno); + return -EINVAL; + } + t = btf_type_by_id(meta->btf, btf_id); nr_slots = t->size / BPF_REG_SIZE; if (is_iter_new_kfunc(meta)) { @@ -7881,7 +7892,9 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id if (err) return err; } else { - /* iter_next() or iter_destroy() expect initialized iter state*/ + /* iter_next() or iter_destroy(), as well as any kfunc + * accepting iter argument, expect initialized iter state + */ err = is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots); switch (err) { case 0: @@ -10993,7 +11006,7 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env, if (is_kfunc_arg_dynptr(meta->btf, &args[argno])) return KF_ARG_PTR_TO_DYNPTR; - if (is_kfunc_arg_iter(meta, argno)) + if (is_kfunc_arg_iter(meta, argno, &args[argno])) return KF_ARG_PTR_TO_ITER; if (is_kfunc_arg_list_head(meta->btf, &args[argno])) -- 2.34.1
From: Andrii Nakryiko <andrii@kernel.org> mainline inclusion from mainline-v6.12-rc1 commit b0cd726f9a8279b33faa5b4b0011df14f331fb33 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Define BPF iterator "getter" kfunc, which accepts iterator pointer as one of the arguments. Make sure that argument passed doesn't have to be the very first argument (unlike new-next-destroy combo). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240808232230.2848712-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Conflicts: tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c [Fix conflicts caused by context diff.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/bpf_testmod/bpf_testmod.c | 16 ++++-- .../selftests/bpf/progs/iters_testmod_seq.c | 50 +++++++++++++++++++ 2 files changed, 62 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c index 700bba39a22c..72d26adba834 100644 --- a/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c +++ b/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod.c @@ -112,13 +112,12 @@ bpf_testmod_test_mod_kfunc(int i) __bpf_kfunc int bpf_iter_testmod_seq_new(struct bpf_iter_testmod_seq *it, s64 value, int cnt) { - if (cnt < 0) { - it->cnt = 0; + it->cnt = cnt; + + if (cnt < 0) return -EINVAL; - } it->value = value; - it->cnt = cnt; return 0; } @@ -133,6 +132,14 @@ __bpf_kfunc s64 *bpf_iter_testmod_seq_next(struct bpf_iter_testmod_seq* it) return &it->value; } +__bpf_kfunc s64 bpf_iter_testmod_seq_value(int val, struct bpf_iter_testmod_seq* it__iter) +{ + if (it__iter->cnt < 0) + return 0; + + return val + it__iter->value; +} + __bpf_kfunc void bpf_iter_testmod_seq_destroy(struct bpf_iter_testmod_seq *it) { it->cnt = 0; @@ -376,6 +383,7 @@ BTF_ID_FLAGS(func, bpf_iter_testmod_seq_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_testmod_seq_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_testmod_ctx_create, KF_ACQUIRE | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_testmod_ctx_release, KF_RELEASE) +BTF_ID_FLAGS(func, bpf_iter_testmod_seq_value) BTF_KFUNCS_END(bpf_testmod_common_kfunc_ids) BTF_ID_LIST(bpf_testmod_dtor_ids) diff --git a/tools/testing/selftests/bpf/progs/iters_testmod_seq.c b/tools/testing/selftests/bpf/progs/iters_testmod_seq.c index 3873fb6c292a..4a176e6aede8 100644 --- a/tools/testing/selftests/bpf/progs/iters_testmod_seq.c +++ b/tools/testing/selftests/bpf/progs/iters_testmod_seq.c @@ -12,6 +12,7 @@ struct bpf_iter_testmod_seq { extern int bpf_iter_testmod_seq_new(struct bpf_iter_testmod_seq *it, s64 value, int cnt) __ksym; extern s64 *bpf_iter_testmod_seq_next(struct bpf_iter_testmod_seq *it) __ksym; +extern s64 bpf_iter_testmod_seq_value(int blah, struct bpf_iter_testmod_seq *it) __ksym; extern void bpf_iter_testmod_seq_destroy(struct bpf_iter_testmod_seq *it) __ksym; const volatile __s64 exp_empty = 0 + 1; @@ -76,4 +77,53 @@ int testmod_seq_truncated(const void *ctx) return 0; } +SEC("?raw_tp") +__failure +__msg("expected an initialized iter_testmod_seq as arg #2") +int testmod_seq_getter_before_bad(const void *ctx) +{ + struct bpf_iter_testmod_seq it; + + return bpf_iter_testmod_seq_value(0, &it); +} + +SEC("?raw_tp") +__failure +__msg("expected an initialized iter_testmod_seq as arg #2") +int testmod_seq_getter_after_bad(const void *ctx) +{ + struct bpf_iter_testmod_seq it; + s64 sum = 0, *v; + + bpf_iter_testmod_seq_new(&it, 100, 100); + + while ((v = bpf_iter_testmod_seq_next(&it))) { + sum += *v; + } + + bpf_iter_testmod_seq_destroy(&it); + + return sum + bpf_iter_testmod_seq_value(0, &it); +} + +SEC("?socket") +__success __retval(1000000) +int testmod_seq_getter_good(const void *ctx) +{ + struct bpf_iter_testmod_seq it; + s64 sum = 0, *v; + + bpf_iter_testmod_seq_new(&it, 100, 100); + + while ((v = bpf_iter_testmod_seq_next(&it))) { + sum += *v; + } + + sum *= bpf_iter_testmod_seq_value(0, &it); + + bpf_iter_testmod_seq_destroy(&it); + + return sum; +} + char _license[] SEC("license") = "GPL"; -- 2.34.1
From: Martin KaFai Lau <martin.lau@kernel.org> mainline inclusion from mainline-v6.11-rc7 commit b408473ea01b2e499d23503e2bf898416da9d7ac category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- The pointer returned by btf_parse_base could be an error pointer. IS_ERR() check is needed before calling btf_free(base_btf). Fixes: 8646db238997 ("libbpf,bpf: Share BTF relocate-related code with kernel") Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20240830012214.1646005-1-martin.lau@linux.dev Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/btf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index a3918dd19331..3b6479ea0eb9 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -5959,7 +5959,7 @@ static struct btf *btf_parse_module(const char *module_name, const void *data, errout: btf_verifier_env_free(env); - if (base_btf != vmlinux_btf) + if (!IS_ERR(base_btf) && base_btf != vmlinux_btf) btf_free(base_btf); if (btf) { kvfree(btf->data); -- 2.34.1
From: Tony Ambardar <tony.ambardar@gmail.com> mainline inclusion from mainline-v6.12-rc1 commit da18bfa59d403040d8bcba1284285916fe9e4425 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- New split BTF needs to preserve base's endianness. Similarly, when creating a distilled BTF, we need to preserve original endianness. Fix by updating libbpf's btf__distill_base() and btf_new_empty() to retain the byte order of any source BTF objects when creating new ones. Fixes: ba451366bf44 ("libbpf: Implement basic split BTF support") Fixes: 58e185a0dc35 ("libbpf: Add btf__distill_base() creating split BTF with distilled base BTF") Reported-by: Song Liu <song@kernel.org> Reported-by: Eduard Zingerman <eddyz87@gmail.com> Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/6358db36c5f68b07873a0a5be2d062b1af5ea5f8.camel@g... Link: https://lore.kernel.org/bpf/20240830095150.278881-1-tony.ambardar@gmail.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/btf.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index ed64b5c207b8..f433a5282042 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -837,6 +837,7 @@ static struct btf *btf_new_empty(struct btf *base_btf) btf->base_btf = base_btf; btf->start_id = btf__type_cnt(base_btf); btf->start_str_off = base_btf->hdr->str_len; + btf->swapped_endian = base_btf->swapped_endian; } /* +1 for empty string at offset 0 */ @@ -5226,6 +5227,9 @@ int btf__distill_base(const struct btf *src_btf, struct btf **new_base_btf, new_base = btf__new_empty(); if (!new_base) return libbpf_err(-ENOMEM); + + btf__set_endianness(new_base, btf__endianness(src_btf)); + dist.id_map = calloc(n, sizeof(*dist.id_map)); if (!dist.id_map) { err = -ENOMEM; -- 2.34.1
From: Eduard Zingerman <eddyz87@gmail.com> mainline inclusion from mainline-v6.12-rc1 commit 181b0d1af5d5b2ba8996fa715382cbfbfef46b6e category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Create a BTF with endianness different from host, make a distilled base/split BTF pair from it, dump as raw bytes, import again and verify that endianness is preserved. Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Tested-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240830173406.1581007-1-eddyz87@gmail.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/prog_tests/btf_distill.c | 68 +++++++++++++++++++ 1 file changed, 68 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/btf_distill.c b/tools/testing/selftests/bpf/prog_tests/btf_distill.c index bfbe795823a2..ca84726d5ac1 100644 --- a/tools/testing/selftests/bpf/prog_tests/btf_distill.c +++ b/tools/testing/selftests/bpf/prog_tests/btf_distill.c @@ -535,6 +535,72 @@ static void test_distilled_base_vmlinux(void) btf__free(vmlinux_btf); } +/* Split and new base BTFs should inherit endianness from source BTF. */ +static void test_distilled_endianness(void) +{ + struct btf *base = NULL, *split = NULL, *new_base = NULL, *new_split = NULL; + struct btf *new_base1 = NULL, *new_split1 = NULL; + enum btf_endianness inverse_endianness; + const void *raw_data; + __u32 size; + + base = btf__new_empty(); + if (!ASSERT_OK_PTR(base, "empty_main_btf")) + return; + inverse_endianness = btf__endianness(base) == BTF_LITTLE_ENDIAN ? BTF_BIG_ENDIAN + : BTF_LITTLE_ENDIAN; + btf__set_endianness(base, inverse_endianness); + btf__add_int(base, "int", 4, BTF_INT_SIGNED); /* [1] int */ + VALIDATE_RAW_BTF( + base, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED"); + split = btf__new_empty_split(base); + if (!ASSERT_OK_PTR(split, "empty_split_btf")) + goto cleanup; + btf__add_ptr(split, 1); + VALIDATE_RAW_BTF( + split, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] PTR '(anon)' type_id=1"); + if (!ASSERT_EQ(0, btf__distill_base(split, &new_base, &new_split), + "distilled_base") || + !ASSERT_OK_PTR(new_base, "distilled_base") || + !ASSERT_OK_PTR(new_split, "distilled_split") || + !ASSERT_EQ(2, btf__type_cnt(new_base), "distilled_base_type_cnt")) + goto cleanup; + VALIDATE_RAW_BTF( + new_split, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] PTR '(anon)' type_id=1"); + + raw_data = btf__raw_data(new_base, &size); + if (!ASSERT_OK_PTR(raw_data, "btf__raw_data #1")) + goto cleanup; + new_base1 = btf__new(raw_data, size); + if (!ASSERT_OK_PTR(new_base1, "new_base1 = btf__new()")) + goto cleanup; + raw_data = btf__raw_data(new_split, &size); + if (!ASSERT_OK_PTR(raw_data, "btf__raw_data #2")) + goto cleanup; + new_split1 = btf__new_split(raw_data, size, new_base1); + if (!ASSERT_OK_PTR(new_split1, "new_split1 = btf__new()")) + goto cleanup; + + ASSERT_EQ(btf__endianness(new_base1), inverse_endianness, "new_base1 endianness"); + ASSERT_EQ(btf__endianness(new_split1), inverse_endianness, "new_split1 endianness"); + VALIDATE_RAW_BTF( + new_split1, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] PTR '(anon)' type_id=1"); +cleanup: + btf__free(new_split1); + btf__free(new_base1); + btf__free(new_split); + btf__free(new_base); + btf__free(split); + btf__free(base); +} + void test_btf_distill(void) { if (test__start_subtest("distilled_base")) @@ -549,4 +615,6 @@ void test_btf_distill(void) test_distilled_base_multi_err2(); if (test__start_subtest("distilled_base_vmlinux")) test_distilled_base_vmlinux(); + if (test__start_subtest("distilled_endianness")) + test_distilled_endianness(); } -- 2.34.1
From: Hou Tao <houtao1@huawei.com> mainline inclusion from mainline-v6.12-rc6 commit 101ccfbabf4738041273ce64e2b116cf440dea13 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- bpf_iter_bits_destroy() uses "kit->nr_bits <= 64" to check whether the bits are dynamically allocated. However, the check is incorrect and may cause a kmemleak as shown below: unreferenced object 0xffff88812628c8c0 (size 32): comm "swapper/0", pid 1, jiffies 4294727320 hex dump (first 32 bytes): b0 c1 55 f5 81 88 ff ff f0 f0 f0 f0 f0 f0 f0 f0 ..U........... f0 f0 f0 f0 f0 f0 f0 f0 00 00 00 00 00 00 00 00 .............. backtrace (crc 781e32cc): [<00000000c452b4ab>] kmemleak_alloc+0x4b/0x80 [<0000000004e09f80>] __kmalloc_node_noprof+0x480/0x5c0 [<00000000597124d6>] __alloc.isra.0+0x89/0xb0 [<000000004ebfffcd>] alloc_bulk+0x2af/0x720 [<00000000d9c10145>] prefill_mem_cache+0x7f/0xb0 [<00000000ff9738ff>] bpf_mem_alloc_init+0x3e2/0x610 [<000000008b616eac>] bpf_global_ma_init+0x19/0x30 [<00000000fc473efc>] do_one_initcall+0xd3/0x3c0 [<00000000ec81498c>] kernel_init_freeable+0x66a/0x940 [<00000000b119f72f>] kernel_init+0x20/0x160 [<00000000f11ac9a7>] ret_from_fork+0x3c/0x70 [<0000000004671da4>] ret_from_fork_asm+0x1a/0x30 That is because nr_bits will be set as zero in bpf_iter_bits_next() after all bits have been iterated. Fix the issue by setting kit->bit to kit->nr_bits instead of setting kit->nr_bits to zero when the iteration completes in bpf_iter_bits_next(). In addition, use "!nr_bits || bits >= nr_bits" to check whether the iteration is complete and still use "nr_bits > 64" to indicate whether bits are dynamically allocated. The "!nr_bits" check is necessary because bpf_iter_bits_new() may fail before setting kit->nr_bits, and this condition will stop the iteration early instead of accessing the zeroed or freed kit->bits. Considering the initial value of kit->bits is -1 and the type of kit->nr_bits is unsigned int, change the type of kit->nr_bits to int. The potential overflow problem will be handled in the following patch. Fixes: 4665415975b0 ("bpf: Add bits iterator") Acked-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241030100516.3633640-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/helpers.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 364498c643de..a6897551899a 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2634,7 +2634,7 @@ struct bpf_iter_bits_kern { unsigned long *bits; unsigned long bits_copy; }; - u32 nr_bits; + int nr_bits; int bit; } __aligned(8); @@ -2708,17 +2708,16 @@ bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_w __bpf_kfunc int *bpf_iter_bits_next(struct bpf_iter_bits *it) { struct bpf_iter_bits_kern *kit = (void *)it; - u32 nr_bits = kit->nr_bits; + int bit = kit->bit, nr_bits = kit->nr_bits; const unsigned long *bits; - int bit; - if (nr_bits == 0) + if (!nr_bits || bit >= nr_bits) return NULL; bits = nr_bits == 64 ? &kit->bits_copy : kit->bits; - bit = find_next_bit(bits, nr_bits, kit->bit + 1); + bit = find_next_bit(bits, nr_bits, bit + 1); if (bit >= nr_bits) { - kit->nr_bits = 0; + kit->bit = bit; return NULL; } -- 2.34.1
From: Hou Tao <houtao1@huawei.com> mainline inclusion from mainline-v6.12-rc6 commit 62a898b07b83f6f407003d8a70f0827a5af08a59 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Introduce bpf_mem_alloc_check_size() to check whether the allocation size exceeds the limitation for the kmalloc-equivalent allocator. The upper limit for percpu allocation is LLIST_NODE_SZ bytes larger than non-percpu allocation, so a percpu argument is added to the helper. The helper will be used in the following patch to check whether the size parameter passed to bpf_mem_alloc() is too big. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241030100516.3633640-3-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Conflicts: include/linux/bpf_mem_alloc.h [Fix conflicts caused by context diff.] Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/bpf_mem_alloc.h | 3 +++ kernel/bpf/memalloc.c | 14 +++++++++++++- 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h index bb1223b21308..50ab9929b3d0 100644 --- a/include/linux/bpf_mem_alloc.h +++ b/include/linux/bpf_mem_alloc.h @@ -25,6 +25,9 @@ struct bpf_mem_alloc { int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu); void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); +/* Check the allocation size for kmalloc equivalent allocator */ +int bpf_mem_alloc_check_size(bool percpu, size_t size); + /* kmalloc/kfree equivalent: */ void *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size); void bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr); diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c index 85f9501ff6e6..38fe720d0f87 100644 --- a/kernel/bpf/memalloc.c +++ b/kernel/bpf/memalloc.c @@ -35,6 +35,8 @@ */ #define LLIST_NODE_SZ sizeof(struct llist_node) +#define BPF_MEM_ALLOC_SIZE_MAX 4096 + /* similar to kmalloc, but sizeof == 8 bucket is gone */ static u8 size_index[24] __ro_after_init = { 3, /* 8 */ @@ -65,7 +67,7 @@ static u8 size_index[24] __ro_after_init = { static int bpf_mem_cache_idx(size_t size) { - if (!size || size > 4096) + if (!size || size > BPF_MEM_ALLOC_SIZE_MAX) return -1; if (size <= 192) @@ -930,3 +932,13 @@ void notrace *bpf_mem_cache_alloc_flags(struct bpf_mem_alloc *ma, gfp_t flags) return !ret ? NULL : ret + LLIST_NODE_SZ; } + +int bpf_mem_alloc_check_size(bool percpu, size_t size) +{ + /* The size of percpu allocation doesn't have LLIST_NODE_SZ overhead */ + if ((percpu && size > BPF_MEM_ALLOC_SIZE_MAX) || + (!percpu && size > BPF_MEM_ALLOC_SIZE_MAX - LLIST_NODE_SZ)) + return -E2BIG; + + return 0; +} -- 2.34.1
From: Hou Tao <houtao1@huawei.com> mainline inclusion from mainline-v6.12-rc6 commit 393397fbdcad7396639d7077c33f86169184ba99 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Check the validity of nr_words in bpf_iter_bits_new(). Without this check, when multiplication overflow occurs for nr_bits (e.g., when nr_words = 0x0400-0001, nr_bits becomes 64), stack corruption may occur due to bpf_probe_read_kernel_common(..., nr_bytes = 0x2000-0008). Fix it by limiting the maximum value of nr_words to 511. The value is derived from the current implementation of BPF memory allocator. To ensure compatibility if the BPF memory allocator's size limitation changes in the future, use the helper bpf_mem_alloc_check_size() to check whether nr_bytes is too larger. And return -E2BIG instead of -ENOMEM for oversized nr_bytes. Fixes: 4665415975b0 ("bpf: Add bits iterator") Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241030100516.3633640-4-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/helpers.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index a6897551899a..d80ea34b133b 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2629,6 +2629,8 @@ struct bpf_iter_bits { __u64 __opaque[2]; } __aligned(8); +#define BITS_ITER_NR_WORDS_MAX 511 + struct bpf_iter_bits_kern { union { unsigned long *bits; @@ -2643,7 +2645,8 @@ struct bpf_iter_bits_kern { * @it: The new bpf_iter_bits to be created * @unsafe_ptr__ign: A pointer pointing to a memory area to be iterated over * @nr_words: The size of the specified memory area, measured in 8-byte units. - * Due to the limitation of memalloc, it can't be greater than 512. + * The maximum value of @nr_words is @BITS_ITER_NR_WORDS_MAX. This limit may be + * further reduced by the BPF memory allocator implementation. * * This function initializes a new bpf_iter_bits structure for iterating over * a memory area which is specified by the @unsafe_ptr__ign and @nr_words. It @@ -2670,6 +2673,8 @@ bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_w if (!unsafe_ptr__ign || !nr_words) return -EINVAL; + if (nr_words > BITS_ITER_NR_WORDS_MAX) + return -E2BIG; /* Optimization for u64 mask */ if (nr_bits == 64) { @@ -2681,6 +2686,9 @@ bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_w return 0; } + if (bpf_mem_alloc_check_size(false, nr_bytes)) + return -E2BIG; + /* Fallback to memalloc */ kit->bits = bpf_mem_alloc(&bpf_global_ma, nr_bytes); if (!kit->bits) -- 2.34.1
From: Hou Tao <houtao1@huawei.com> mainline inclusion from mainline-v6.12-rc6 commit e1339383675063ae4760d81ffe13a79981841b8d category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- On 32-bit hosts (e.g., arm32), when a bpf program passes a u64 to bpf_iter_bits_new(), bpf_iter_bits_new() will use bits_copy to store the content of the u64. However, bits_copy is only 4 bytes, leading to stack corruption. The straightforward solution would be to replace u64 with unsigned long in bpf_iter_bits_new(). However, this introduces confusion and problems for 32-bit hosts because the size of ulong in bpf program is 8 bytes, but it is treated as 4-bytes after passed to bpf_iter_bits_new(). Fix it by changing the type of both bits and bit_count from unsigned long to u64. However, the change is not enough. The main reason is that bpf_iter_bits_next() uses find_next_bit() to find the next bit and the pointer passed to find_next_bit() is an unsigned long pointer instead of a u64 pointer. For 32-bit little-endian host, it is fine but it is not the case for 32-bit big-endian host. Because under 32-bit big-endian host, the first iterated unsigned long will be the bits 32-63 of the u64 instead of the expected bits 0-31. Therefore, in addition to changing the type, swap the two unsigned longs within the u64 for 32-bit big-endian host. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241030100516.3633640-5-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- kernel/bpf/helpers.c | 33 ++++++++++++++++++++++++++++++--- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index d80ea34b133b..c6894e40fb02 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2633,13 +2633,36 @@ struct bpf_iter_bits { struct bpf_iter_bits_kern { union { - unsigned long *bits; - unsigned long bits_copy; + __u64 *bits; + __u64 bits_copy; }; int nr_bits; int bit; } __aligned(8); +/* On 64-bit hosts, unsigned long and u64 have the same size, so passing + * a u64 pointer and an unsigned long pointer to find_next_bit() will + * return the same result, as both point to the same 8-byte area. + * + * For 32-bit little-endian hosts, using a u64 pointer or unsigned long + * pointer also makes no difference. This is because the first iterated + * unsigned long is composed of bits 0-31 of the u64 and the second unsigned + * long is composed of bits 32-63 of the u64. + * + * However, for 32-bit big-endian hosts, this is not the case. The first + * iterated unsigned long will be bits 32-63 of the u64, so swap these two + * ulong values within the u64. + */ +static void swap_ulong_in_u64(u64 *bits, unsigned int nr) +{ +#if (BITS_PER_LONG == 32) && defined(__BIG_ENDIAN) + unsigned int i; + + for (i = 0; i < nr; i++) + bits[i] = (bits[i] >> 32) | ((u64)(u32)bits[i] << 32); +#endif +} + /** * bpf_iter_bits_new() - Initialize a new bits iterator for a given memory area * @it: The new bpf_iter_bits to be created @@ -2682,6 +2705,8 @@ bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_w if (err) return -EFAULT; + swap_ulong_in_u64(&kit->bits_copy, nr_words); + kit->nr_bits = nr_bits; return 0; } @@ -2700,6 +2725,8 @@ bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, u32 nr_w return err; } + swap_ulong_in_u64(kit->bits, nr_words); + kit->nr_bits = nr_bits; return 0; } @@ -2717,7 +2744,7 @@ __bpf_kfunc int *bpf_iter_bits_next(struct bpf_iter_bits *it) { struct bpf_iter_bits_kern *kit = (void *)it; int bit = kit->bit, nr_bits = kit->nr_bits; - const unsigned long *bits; + const void *bits; if (!nr_bits || bit >= nr_bits) return NULL; -- 2.34.1
From: Hou Tao <houtao1@huawei.com> mainline inclusion from mainline-v6.12-rc6 commit ebafc1e535db19505aec3b94a4a641fe735a2eac category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Add more test cases for bits iterator: (1) huge word test Verify the multiplication overflow of nr_bits in bits_iter. Without the overflow check, when nr_words is 67108865, nr_bits becomes 64, causing bpf_probe_read_kernel_common() to corrupt the stack. (2) max word test Verify correct handling of maximum nr_words value (511). (3) bad word test Verify early termination of bits iteration when bits iterator initialization fails. Also rename bits_nomem to bits_too_big to better reflect its purpose. Signed-off-by: Hou Tao <houtao1@huawei.com> Link: https://lore.kernel.org/r/20241030100516.3633640-6-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/progs/verifier_bits_iter.c | 61 ++++++++++++++++++- 1 file changed, 58 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/verifier_bits_iter.c b/tools/testing/selftests/bpf/progs/verifier_bits_iter.c index 716113c2bce2..e1421d846ac8 100644 --- a/tools/testing/selftests/bpf/progs/verifier_bits_iter.c +++ b/tools/testing/selftests/bpf/progs/verifier_bits_iter.c @@ -15,6 +15,8 @@ int bpf_iter_bits_new(struct bpf_iter_bits *it, const u64 *unsafe_ptr__ign, int *bpf_iter_bits_next(struct bpf_iter_bits *it) __ksym __weak; void bpf_iter_bits_destroy(struct bpf_iter_bits *it) __ksym __weak; +u64 bits_array[511] = {}; + SEC("iter.s/cgroup") __description("bits iter without destroy") __failure __msg("Unreleased reference") @@ -110,16 +112,16 @@ int bit_index(void) } SEC("syscall") -__description("bits nomem") +__description("bits too big") __success __retval(0) -int bits_nomem(void) +int bits_too_big(void) { u64 data[4]; int nr = 0; int *bit; __builtin_memset(&data, 0xff, sizeof(data)); - bpf_for_each(bits, bit, &data[0], 513) /* Be greater than 512 */ + bpf_for_each(bits, bit, &data[0], 512) /* Be greater than 511 */ nr++; return nr; } @@ -151,3 +153,56 @@ int zero_words(void) nr++; return nr; } + +SEC("syscall") +__description("huge words") +__success __retval(0) +int huge_words(void) +{ + u64 data[8] = {0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1, 0x1}; + int nr = 0; + int *bit; + + bpf_for_each(bits, bit, &data[0], 67108865) + nr++; + return nr; +} + +SEC("syscall") +__description("max words") +__success __retval(4) +int max_words(void) +{ + volatile int nr = 0; + int *bit; + + bits_array[0] = (1ULL << 63) | 1U; + bits_array[510] = (1ULL << 33) | (1ULL << 32); + + bpf_for_each(bits, bit, bits_array, 511) { + if (nr == 0 && *bit != 0) + break; + if (nr == 2 && *bit != 32672) + break; + nr++; + } + return nr; +} + +SEC("syscall") +__description("bad words") +__success __retval(0) +int bad_words(void) +{ + void *bad_addr = (void *)(3UL << 30); + int nr = 0; + int *bit; + + bpf_for_each(bits, bit, bad_addr, 1) + nr++; + + bpf_for_each(bits, bit, bad_addr, 4) + nr++; + + return nr; +} -- 2.34.1
From: Hou Tao <houtao1@huawei.com> mainline inclusion from mainline-v6.12 commit 6801cf7890f2ed8fcc14859b47501f8ee7a58ec7 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- As reported by Byeonguk, the bad_words test in verifier_bits_iter.c occasionally fails on s390 host. Quoting Ilya's explanation: s390 kernel runs in a completely separate address space, there is no user/kernel split at TASK_SIZE. The same address may be valid in both the kernel and the user address spaces, there is no way to tell by looking at it. The config option related to this property is ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE. Also, unfortunately, 0 is a valid address in the s390 kernel address space. Fix the issue by using -4095 as the bad address for bits iterator, as suggested by Ilya. Verify that bpf_iter_bits_new() returns -EINVAL for NULL address and -EFAULT for bad address. Fixes: ebafc1e535db ("selftests/bpf: Add three test cases for bits_iter") Reported-by: Byeonguk Jeong <jungbu2855@gmail.com> Closes: https://lore.kernel.org/bpf/ZycSXwjH4UTvx-Cn@ub22/ Signed-off-by: Hou Tao <houtao1@huawei.com> Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20241105043057.3371482-1-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/progs/verifier_bits_iter.c | 32 ++++++++++++++++--- 1 file changed, 28 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/bpf/progs/verifier_bits_iter.c b/tools/testing/selftests/bpf/progs/verifier_bits_iter.c index e1421d846ac8..b6b018baed46 100644 --- a/tools/testing/selftests/bpf/progs/verifier_bits_iter.c +++ b/tools/testing/selftests/bpf/progs/verifier_bits_iter.c @@ -57,9 +57,15 @@ __description("null pointer") __success __retval(0) int null_pointer(void) { - int nr = 0; + struct bpf_iter_bits iter; + int err, nr = 0; int *bit; + err = bpf_iter_bits_new(&iter, NULL, 1); + bpf_iter_bits_destroy(&iter); + if (err != -EINVAL) + return 1; + bpf_for_each(bits, bit, NULL, 1) nr++; return nr; @@ -194,15 +200,33 @@ __description("bad words") __success __retval(0) int bad_words(void) { - void *bad_addr = (void *)(3UL << 30); - int nr = 0; + void *bad_addr = (void *)-4095; + struct bpf_iter_bits iter; + volatile int nr; int *bit; + int err; + + err = bpf_iter_bits_new(&iter, bad_addr, 1); + bpf_iter_bits_destroy(&iter); + if (err != -EFAULT) + return 1; + nr = 0; bpf_for_each(bits, bit, bad_addr, 1) nr++; + if (nr != 0) + return 2; + err = bpf_iter_bits_new(&iter, bad_addr, 4); + bpf_iter_bits_destroy(&iter); + if (err != -EFAULT) + return 3; + + nr = 0; bpf_for_each(bits, bit, bad_addr, 4) nr++; + if (nr != 0) + return 4; - return nr; + return 0; } -- 2.34.1
From: Martin KaFai Lau <martin.lau@kernel.org> mainline inclusion from mainline-v6.14-rc1 commit 96ea081ed52bf077cad6d00153b6fba68e510767 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- There is a UAF report in the bpf_struct_ops when CONFIG_MODULES=n. In particular, the report is on tcp_congestion_ops that has a "struct module *owner" member. For struct_ops that has a "struct module *owner" member, it can be extended either by the regular kernel module or by the bpf_struct_ops. bpf_try_module_get() will be used to do the refcounting and different refcount is done based on the owner pointer. When CONFIG_MODULES=n, the btf_id of the "struct module" is missing: WARN: resolve_btfids: unresolved symbol module Thus, the bpf_try_module_get() cannot do the correct refcounting. Not all subsystem's struct_ops requires the "struct module *owner" member. e.g. the recent sched_ext_ops. This patch is to disable bpf_struct_ops registration if the struct_ops has the "struct module *" member and the "struct module" btf_id is missing. The btf_type_is_fwd() helper is moved to the btf.h header file for this test. This has happened since the beginning of bpf_struct_ops which has gone through many changes. The Fixes tag is set to a recent commit that this patch can apply cleanly. Considering CONFIG_MODULES=n is not common and the age of the issue, targeting for bpf-next also. Fixes: 1611603537a4 ("bpf: Create argument information for nullable arguments.") Reported-by: Robert Morris <rtm@csail.mit.edu> Closes: https://lore.kernel.org/bpf/74665.1733669976@localhost/ Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Tested-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20241220201818.127152-1-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- include/linux/btf.h | 5 +++++ kernel/bpf/bpf_struct_ops.c | 21 +++++++++++++++++++++ kernel/bpf/btf.c | 5 ----- 3 files changed, 26 insertions(+), 5 deletions(-) diff --git a/include/linux/btf.h b/include/linux/btf.h index 3dad546d37aa..b597008ece90 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -352,6 +352,11 @@ static inline bool btf_type_is_scalar(const struct btf_type *t) return btf_type_is_int(t) || btf_type_is_enum(t); } +static inline bool btf_type_is_fwd(const struct btf_type *t) +{ + return BTF_INFO_KIND(t->info) == BTF_KIND_FWD; +} + static inline bool btf_type_is_typedef(const struct btf_type *t) { return BTF_INFO_KIND(t->info) == BTF_KIND_TYPEDEF; diff --git a/kernel/bpf/bpf_struct_ops.c b/kernel/bpf/bpf_struct_ops.c index 1409a844183c..32143bc7f73b 100644 --- a/kernel/bpf/bpf_struct_ops.c +++ b/kernel/bpf/bpf_struct_ops.c @@ -307,6 +307,20 @@ void bpf_struct_ops_desc_release(struct bpf_struct_ops_desc *st_ops_desc) kfree(arg_info); } +static bool is_module_member(const struct btf *btf, u32 id) +{ + const struct btf_type *t; + + t = btf_type_resolve_ptr(btf, id, NULL); + if (!t) + return false; + + if (!__btf_type_is_struct(t) && !btf_type_is_fwd(t)) + return false; + + return !strcmp(btf_name_by_offset(btf, t->name_off), "module"); +} + int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, struct btf *btf, struct bpf_verifier_log *log) @@ -381,6 +395,13 @@ int bpf_struct_ops_desc_init(struct bpf_struct_ops_desc *st_ops_desc, goto errout; } + if (!st_ops_ids[IDX_MODULE_ID] && is_module_member(btf, member->type)) { + pr_warn("'struct module' btf id not found. Is CONFIG_MODULES enabled? bpf_struct_ops '%s' needs module support.\n", + st_ops->name); + err = -EOPNOTSUPP; + goto errout; + } + func_proto = btf_type_resolve_func_ptr(btf, member->type, NULL); diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 3b6479ea0eb9..1c5ed5d4b591 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -501,11 +501,6 @@ bool btf_type_is_void(const struct btf_type *t) return t == &btf_void; } -static bool btf_type_is_fwd(const struct btf_type *t) -{ - return BTF_INFO_KIND(t->info) == BTF_KIND_FWD; -} - static bool btf_type_is_datasec(const struct btf_type *t) { return BTF_INFO_KIND(t->info) == BTF_KIND_DATASEC; -- 2.34.1
From: Pu Lehui <pulehui@huawei.com> mainline inclusion from mainline-v6.14-rc1 commit 4a04cb326a6c7f9a2c066f8c2ca78a5a9b87ddab category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- Fix btf leak on new btf alloc failure in btf_distill test. Fixes: affdeb50616b ("selftests/bpf: Extend distilled BTF tests to cover BTF relocation") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250115100241.4171581-1-pulehui@huaweicloud.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/testing/selftests/bpf/prog_tests/btf_distill.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/btf_distill.c b/tools/testing/selftests/bpf/prog_tests/btf_distill.c index ca84726d5ac1..b72b966df77b 100644 --- a/tools/testing/selftests/bpf/prog_tests/btf_distill.c +++ b/tools/testing/selftests/bpf/prog_tests/btf_distill.c @@ -385,7 +385,7 @@ static void test_distilled_base_missing_err(void) "[2] INT 'int' size=8 bits_offset=0 nr_bits=64 encoding=SIGNED"); btf5 = btf__new_empty(); if (!ASSERT_OK_PTR(btf5, "empty_reloc_btf")) - return; + goto cleanup; btf__add_int(btf5, "int", 4, BTF_INT_SIGNED); /* [1] int */ VALIDATE_RAW_BTF( btf5, @@ -478,7 +478,7 @@ static void test_distilled_base_multi_err2(void) "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED"); btf5 = btf__new_empty(); if (!ASSERT_OK_PTR(btf5, "empty_reloc_btf")) - return; + goto cleanup; btf__add_int(btf5, "int", 4, BTF_INT_SIGNED); /* [1] int */ btf__add_int(btf5, "int", 4, BTF_INT_SIGNED); /* [2] int */ VALIDATE_RAW_BTF( -- 2.34.1
From: Pu Lehui <pulehui@huawei.com> mainline inclusion from mainline-v6.14-rc1 commit 5436a54332c19df0acbef2b87cbf9f7cba56f2dd category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- The error number of elf_begin is omitted when encapsulating the btf_find_elf_sections function. Fixes: c86f180ffc99 ("libbpf: Make btf_parse_elf process .BTF.base transparently") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250115100241.4171581-2-pulehui@huaweicloud.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/btf.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index f433a5282042..4ed135131bcb 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -1025,6 +1025,7 @@ static struct btf *btf_parse_elf(const char *path, struct btf *base_btf, elf = elf_begin(fd, ELF_C_READ, NULL); if (!elf) { + err = -LIBBPF_ERRNO__FORMAT; pr_warn("failed to open %s as ELF file\n", path); goto done; } -- 2.34.1
From: Pu Lehui <pulehui@huawei.com> mainline inclusion from mainline-v6.14-rc1 commit 5ca681a86ef93369685cb63f71994f4cf7303e7c category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- When redirecting the split BTF to the vmlinux base BTF, we need to mark the distilled base struct/union members of split BTF structs/unions in id_map with BTF_IS_EMBEDDED. This indicates that these types must match both name and size later. Therefore, we need to traverse the entire split BTF, which involves traversing type IDs from nr_dist_base_types to nr_types. However, the current implementation uses an incorrect traversal end type ID, so let's correct it. Fixes: 19e00c897d50 ("libbpf: Split BTF relocation") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250115100241.4171581-3-pulehui@huaweicloud.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/lib/bpf/btf_relocate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/lib/bpf/btf_relocate.c b/tools/lib/bpf/btf_relocate.c index 6c9a16115bf0..363f3cdd9981 100644 --- a/tools/lib/bpf/btf_relocate.c +++ b/tools/lib/bpf/btf_relocate.c @@ -212,7 +212,7 @@ static int btf_relocate_map_distilled_base(struct btf_relocate *r) * need to match both name and size, otherwise embedding the base * struct/union in the split type is invalid. */ - for (id = r->nr_dist_base_types; id < r->nr_split_types; id++) { + for (id = r->nr_dist_base_types; id < r->nr_dist_base_types + r->nr_split_types; id++) { err = btf_mark_embedded_composite_type_ids(r, id); if (err) goto done; -- 2.34.1
From: Pu Lehui <pulehui@huawei.com> mainline inclusion from mainline-v6.14-rc1 commit 556a399406635566413f9c71b134d5d287b25b29 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- When redirecting the split BTF to the vmlinux base BTF, we need to mark the distilled base struct/union members of split BTF structs/unions in id_map with BTF_IS_EMBEDDED. This indicates that these types must match both name and size later. So if a needed composite type, which is the member of composite type in the split BTF, has a different size in the base BTF we wish to relocate with, btf__relocate() should error out. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250115100241.4171581-4-pulehui@huaweicloud.com Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- .../selftests/bpf/prog_tests/btf_distill.c | 72 +++++++++++++++++++ 1 file changed, 72 insertions(+) diff --git a/tools/testing/selftests/bpf/prog_tests/btf_distill.c b/tools/testing/selftests/bpf/prog_tests/btf_distill.c index b72b966df77b..fb67ae195a73 100644 --- a/tools/testing/selftests/bpf/prog_tests/btf_distill.c +++ b/tools/testing/selftests/bpf/prog_tests/btf_distill.c @@ -601,6 +601,76 @@ static void test_distilled_endianness(void) btf__free(base); } +/* If a needed composite type, which is the member of composite type + * in the split BTF, has a different size in the base BTF we wish to + * relocate with, btf__relocate() should error out. + */ +static void test_distilled_base_embedded_err(void) +{ + struct btf *btf1 = NULL, *btf2 = NULL, *btf3 = NULL, *btf4 = NULL, *btf5 = NULL; + + btf1 = btf__new_empty(); + if (!ASSERT_OK_PTR(btf1, "empty_main_btf")) + return; + + btf__add_int(btf1, "int", 4, BTF_INT_SIGNED); /* [1] int */ + btf__add_struct(btf1, "s1", 4); /* [2] struct s1 { */ + btf__add_field(btf1, "f1", 1, 0, 0); /* int f1; */ + /* } */ + VALIDATE_RAW_BTF( + btf1, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] STRUCT 's1' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0"); + + btf2 = btf__new_empty_split(btf1); + if (!ASSERT_OK_PTR(btf2, "empty_split_btf")) + goto cleanup; + + btf__add_struct(btf2, "with_embedded", 8); /* [3] struct with_embedded { */ + btf__add_field(btf2, "e1", 2, 0, 0); /* struct s1 e1; */ + /* } */ + + VALIDATE_RAW_BTF( + btf2, + "[1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED", + "[2] STRUCT 's1' size=4 vlen=1\n" + "\t'f1' type_id=1 bits_offset=0", + "[3] STRUCT 'with_embedded' size=8 vlen=1\n" + "\t'e1' type_id=2 bits_offset=0"); + + if (!ASSERT_EQ(0, btf__distill_base(btf2, &btf3, &btf4), + "distilled_base") || + !ASSERT_OK_PTR(btf3, "distilled_base") || + !ASSERT_OK_PTR(btf4, "distilled_split") || + !ASSERT_EQ(2, btf__type_cnt(btf3), "distilled_base_type_cnt")) + goto cleanup; + + VALIDATE_RAW_BTF( + btf4, + "[1] STRUCT 's1' size=4 vlen=0", + "[2] STRUCT 'with_embedded' size=8 vlen=1\n" + "\t'e1' type_id=1 bits_offset=0"); + + btf5 = btf__new_empty(); + if (!ASSERT_OK_PTR(btf5, "empty_reloc_btf")) + goto cleanup; + + btf__add_int(btf5, "int", 4, BTF_INT_SIGNED); /* [1] int */ + /* struct with the same name but different size */ + btf__add_struct(btf5, "s1", 8); /* [2] struct s1 { */ + btf__add_field(btf5, "f1", 1, 0, 0); /* int f1; */ + /* } */ + + ASSERT_EQ(btf__relocate(btf4, btf5), -EINVAL, "relocate_split"); +cleanup: + btf__free(btf5); + btf__free(btf4); + btf__free(btf3); + btf__free(btf2); + btf__free(btf1); +} + void test_btf_distill(void) { if (test__start_subtest("distilled_base")) @@ -613,6 +683,8 @@ void test_btf_distill(void) test_distilled_base_multi_err(); if (test__start_subtest("distilled_base_multi_err2")) test_distilled_base_multi_err2(); + if (test__start_subtest("distilled_base_embedded_err")) + test_distilled_base_embedded_err(); if (test__start_subtest("distilled_base_vmlinux")) test_distilled_base_vmlinux(); if (test__start_subtest("distilled_endianness")) -- 2.34.1
From: Pu Lehui <pulehui@huawei.com> hulk inclusion category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA -------------------------------- When bpf selftests is compiled, there will always be one more file_read_pattern file in git status. The reason is that commit 9c9387a5464a ("selftests/bpf: add demo for file read pattern detection") did not add it to the .gitignore file. Fixes: 9c9387a5464a ("selftests/bpf: add demo for file read pattern detection") Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- tools/testing/selftests/bpf/.gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore index f1aebabfb017..729d305aa53c 100644 --- a/tools/testing/selftests/bpf/.gitignore +++ b/tools/testing/selftests/bpf/.gitignore @@ -52,3 +52,4 @@ xdp_redirect_multi xdp_synproxy xdp_hw_metadata xdp_features +file_read_pattern -- 2.34.1
From: "T.J. Mercier" <tjmercier@google.com> mainline inclusion from mainline-v6.15-rc1 commit dc438a9bc7610f23cca29a21bb5588b3c09d37f0 category: feature bugzilla: https://atomgit.com/openeuler/kernel/issues/8335 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... ---------------------------------------------------------------------- This file was renamed from bpf_iter_task_vma.c. Fixes: 45b38941c81f ("selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c") Signed-off-by: T.J. Mercier <tjmercier@google.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20250304204520.201115-1-tjmercier@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Luo Gengkun <luogengkun2@huawei.com> --- Documentation/bpf/bpf_iterators.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/bpf/bpf_iterators.rst b/Documentation/bpf/bpf_iterators.rst index 07433915aa41..7f514cb6b052 100644 --- a/Documentation/bpf/bpf_iterators.rst +++ b/Documentation/bpf/bpf_iterators.rst @@ -86,7 +86,7 @@ following steps: The following are a few examples of selftest BPF iterator programs: * `bpf_iter_tcp4.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_tcp4.c>`_ -* `bpf_iter_task_vma.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_vma.c>`_ +* `bpf_iter_task_vmas.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_vmas.c>`_ * `bpf_iter_task_file.c <https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/bpf_iter_task_file.c>`_ Let us look at ``bpf_iter_task_file.c``, which runs in kernel space: -- 2.34.1
participants (2)
-
Luo Gengkun -
patchwork bot