[PATCH OLK-6.6 00/10] bpf mainline for olk-6.6
Daniel Borkmann (1): bpf, arm64: Fix off-by-one in check_imm signed range check David Carlier (1): bpf: Use RCU-safe iteration in dev_map_redirect_multi() SKB path Jiayuan Chen (2): bpf: Fix use-after-free in offloaded map/prog info fill bpf: Drop task_to_inode and inet_conn_established from lsm sleepable hooks Lang Xu (1): bpf: Fix OOB in pcpu_init_value MingTao Huang (1): bpf: Fix stale offload->prog pointer after constant blinding Mykyta Yatsenko (2): bpf: Fix NULL deref in map_kptr_match_type for scalar regs bpf: Use copy_map_value_locked() in alloc_htab_elem() for BPF_F_LOCK Sechang Lim (1): bpf: Fix RCU stall in bpf_fd_array_map_clear() Weiming Shi (1): bpf: fix end-of-list detection in cgroup_storage_get_next_key() arch/arm64/net/bpf_jit_comp.c | 4 ++-- kernel/bpf/arraymap.c | 4 +++- kernel/bpf/bpf_lsm.c | 2 -- kernel/bpf/core.c | 2 ++ kernel/bpf/devmap.c | 5 ++--- kernel/bpf/hashtab.c | 12 ++++++++---- kernel/bpf/local_storage.c | 2 +- kernel/bpf/offload.c | 10 ++++------ kernel/bpf/verifier.c | 5 ++++- 9 files changed, 26 insertions(+), 20 deletions(-) -- 2.34.1
From: Mykyta Yatsenko <yatsenko@meta.com> mainline inclusion from mainline-v7.1-rc1 commit 4d0a375887ab4d49e4da1ff10f9606cab8f7c3ad category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9180 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Commit ab6c637ad027 ("bpf: Fix a bpf_kptr_xchg() issue with local kptr") refactored map_kptr_match_type() to branch on btf_is_kernel() before checking base_type(). A scalar register stored into a kptr slot has no btf, so the btf_is_kernel(reg->btf) call dereferences NULL. Move the base_type() != PTR_TO_BTF_ID guard before any reg->btf access. Fixes: ab6c637ad027 ("bpf: Fix a bpf_kptr_xchg() issue with local kptr") Reported-by: Hiker Cl <clhiker365@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221372 Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/20260416-kptr_crash-v1-1-5589356584b4@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Conflicts: kernel/bpf/verifier.c [ctx conflicts] Signed-off-by: Pu Lehui <pulehui@huawei.com> --- kernel/bpf/verifier.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 0f61022fd885..b24bcb6af69c 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5376,6 +5376,9 @@ static int map_kptr_match_type(struct bpf_verifier_env *env, int perm_flags; const char *reg_name = ""; + if (base_type(reg->type) != PTR_TO_BTF_ID) + goto bad_type; + if (btf_is_kernel(reg->btf)) { perm_flags = PTR_MAYBE_NULL | PTR_TRUSTED | MEM_RCU; @@ -5386,7 +5389,7 @@ static int map_kptr_match_type(struct bpf_verifier_env *env, perm_flags = PTR_MAYBE_NULL | MEM_ALLOC; } - if (base_type(reg->type) != PTR_TO_BTF_ID || (type_flag(reg->type) & ~perm_flags)) + if (type_flag(reg->type) & ~perm_flags) goto bad_type; /* We need to verify reg->type and reg->btf, before accessing reg->btf */ -- 2.34.1
From: Daniel Borkmann <daniel@iogearbox.net> mainline inclusion from mainline-v7.1-rc1 commit 1dd8be4ec722ce54e4cace59f3a4ba658111b3ec category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9180 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- check_imm(bits, imm) is used in the arm64 BPF JIT to verify that a branch displacement (in arm64 instruction units) fits into the signed N-bit immediate field of a B, B.cond or CBZ/CBNZ encoding before it is handed to the encoder. The macro currently tests for (imm > 0 && imm >> bits) || (imm < 0 && ~imm >> bits) which admits values in [-2^N, 2^N) — effectively a signed (N+1)-bit range. A signed N-bit field only holds [-2^(N-1), 2^(N-1)), so the check admits one extra bit of range on each side. In particular, for check_imm19(), values in [2^18, 2^19) slip past the check but do not fit into the 19-bit signed imm19 field of B.cond. aarch64_insn_encode_immediate() then masks the raw value into the 19-bit field, setting bit 18 (the sign bit) and flipping a forward branch into a backward one. Same class of issue exists for check_imm26() and the B/BL encoding. Shift by (bits - 1) instead of bits so the actual signed N-bit range is enforced. Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20260415121403.639619-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org> Conflicts: arch/arm64/net/bpf_jit_comp.c [ctx conflicts] Signed-off-by: Pu Lehui <pulehui@huawei.com> --- arch/arm64/net/bpf_jit_comp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 057828e9b3f8..971b75b8059f 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -33,8 +33,8 @@ #define FP_BOTTOM (MAX_BPF_JIT_REG + 4) #define check_imm(bits, imm) do { \ - if ((((imm) > 0) && ((imm) >> (bits))) || \ - (((imm) < 0) && (~(imm) >> (bits)))) { \ + if ((((imm) > 0) && ((imm) >> ((bits) - 1))) || \ + (((imm) < 0) && (~(imm) >> ((bits) - 1)))) { \ pr_info("[%2d] imm=%d(0x%x) out of range\n", \ i, imm, imm); \ return -EINVAL; \ -- 2.34.1
From: Jiayuan Chen <jiayuan.chen@linux.dev> mainline inclusion from mainline-v7.1-rc1 commit a0c584fc18056709c8e047a82a6045d6c209f4ce category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9180 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When querying info for an offloaded BPF map or program, bpf_map_offload_info_fill_ns() and bpf_prog_offload_info_fill_ns() obtain the network namespace with get_net(dev_net(offmap->netdev)). However, the associated netdev's netns may be racing with teardown during netns destruction. If the netns refcount has already reached 0, get_net() performs a refcount_t increment on 0, triggering: refcount_t: addition on 0; use-after-free. Although rtnl_lock and bpf_devs_lock ensure the netdev pointer remains valid, they cannot prevent the netns refcount from reaching zero. Fix this by using maybe_get_net() instead of get_net(). maybe_get_net() uses refcount_inc_not_zero() and returns NULL if the refcount is already zero, which causes ns_get_path_cb() to fail and the caller to return -ENOENT -- the correct behavior when the netns is being destroyed. Fixes: 675fc275a3a2d ("bpf: offload: report device information for offloaded programs") Fixes: 52775b33bb507 ("bpf: offload: report device information about offloaded maps") Reported-by: Yinhao Hu <dddddd@hust.edu.cn> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn> Closes: https://lore.kernel.org/bpf/f0aa3678-79c9-47ae-9e8c-02a3d1df160a@hust.edu.cn... Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260409023733.168050-1-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Pu Lehui <pulehui@huawei.com> --- kernel/bpf/offload.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c index 87d6693d8233..60c82c328d42 100644 --- a/kernel/bpf/offload.c +++ b/kernel/bpf/offload.c @@ -437,9 +437,8 @@ static struct ns_common *bpf_prog_offload_info_fill_ns(void *private_data) if (aux->offload) { args->info->ifindex = aux->offload->netdev->ifindex; - net = dev_net(aux->offload->netdev); - get_net(net); - ns = &net->ns; + net = maybe_get_net(dev_net(aux->offload->netdev)); + ns = net ? &net->ns : NULL; } else { args->info->ifindex = 0; ns = NULL; @@ -645,9 +644,8 @@ static struct ns_common *bpf_map_offload_info_fill_ns(void *private_data) if (args->offmap->netdev) { args->info->ifindex = args->offmap->netdev->ifindex; - net = dev_net(args->offmap->netdev); - get_net(net); - ns = &net->ns; + net = maybe_get_net(dev_net(args->offmap->netdev)); + ns = net ? &net->ns : NULL; } else { args->info->ifindex = 0; ns = NULL; -- 2.34.1
From: Jiayuan Chen <jiayuan.chen@linux.dev> mainline inclusion from mainline-v7.1-rc1 commit beaf0e96b1da74549a6cabd040f9667d83b2e97e category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9180 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- bpf_lsm_task_to_inode() is called under rcu_read_lock() and bpf_lsm_inet_conn_established() is called from softirq context, so neither hook can be used by sleepable LSM programs. Fixes: 423f16108c9d8 ("bpf: Augment the set of sleepable LSM hooks") Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn> Reported-by: Yinhao Hu <dddddd@hust.edu.cn> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> Reported-by: Dongliang Mu <dzm91@hust.edu.cn> Closes: https://lore.kernel.org/bpf/3ab69731-24d1-431a-a351-452aafaaf2a5@std.uestc.e... Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20260407122334.344072-1-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> Conflicts: kernel/bpf/bpf_lsm.c [ctx conflicts] Signed-off-by: Pu Lehui <pulehui@huawei.com> --- kernel/bpf/bpf_lsm.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c index 476a79f0ff2d..eae66ba6c6c5 100644 --- a/kernel/bpf/bpf_lsm.c +++ b/kernel/bpf/bpf_lsm.c @@ -302,7 +302,6 @@ BTF_ID(func, bpf_lsm_file_open) BTF_ID(func, bpf_lsm_file_receive) #ifdef CONFIG_SECURITY_NETWORK -BTF_ID(func, bpf_lsm_inet_conn_established) #endif /* CONFIG_SECURITY_NETWORK */ BTF_ID(func, bpf_lsm_inode_create) @@ -365,7 +364,6 @@ BTF_ID(func, bpf_lsm_current_getsecid_subj) BTF_ID(func, bpf_lsm_task_getsecid_obj) BTF_ID(func, bpf_lsm_task_prctl) BTF_ID(func, bpf_lsm_task_setscheduler) -BTF_ID(func, bpf_lsm_task_to_inode) BTF_ID(func, bpf_lsm_userns_create) BTF_SET_END(sleepable_lsm_hooks) -- 2.34.1
From: Sechang Lim <rhkrqnwk98@gmail.com> mainline inclusion from mainline-v7.1-rc1 commit 4406942e65ca128c56c67443832988873c21d2e9 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9180 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- Add a missing cond_resched() in bpf_fd_array_map_clear() loop. For PROG_ARRAY maps with many entries this loop calls prog_array_map_poke_run() per entry which can be expensive, and without yielding this can cause RCU stalls under load: rcu: Stack dump where RCU GP kthread last ran: CPU: 0 UID: 0 PID: 30932 Comm: kworker/0:2 Not tainted 6.14.0-13195-g967e8def1100 #2 PREEMPT(undef) Workqueue: events prog_array_map_clear_deferred RIP: 0010:write_comp_data+0x38/0x90 kernel/kcov.c:246 Call Trace: <TASK> prog_array_map_poke_run+0x77/0x380 kernel/bpf/arraymap.c:1096 __fd_array_map_delete_elem+0x197/0x310 kernel/bpf/arraymap.c:925 bpf_fd_array_map_clear kernel/bpf/arraymap.c:1000 [inline] prog_array_map_clear_deferred+0x119/0x1b0 kernel/bpf/arraymap.c:1141 process_one_work+0x898/0x19d0 kernel/workqueue.c:3238 process_scheduled_works kernel/workqueue.c:3319 [inline] worker_thread+0x770/0x10b0 kernel/workqueue.c:3400 kthread+0x465/0x880 kernel/kthread.c:464 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:153 ret_from_fork_asm+0x19/0x30 arch/x86/entry/entry_64.S:245 </TASK> Reviewed-by: Sun Jian <sun.jian.kdev@gmail.com> Fixes: da765a2f5993 ("bpf: Add poke dependency tracking for prog array maps") Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com> Link: https://lore.kernel.org/r/20260407103823.3942156-1-rhkrqnwk98@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Pu Lehui <pulehui@huawei.com> --- kernel/bpf/arraymap.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 7f1ef37c67be..ca135fa407b8 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -960,8 +960,10 @@ static void bpf_fd_array_map_clear(struct bpf_map *map, bool need_defer) struct bpf_array *array = container_of(map, struct bpf_array, map); int i; - for (i = 0; i < array->map.max_entries; i++) + for (i = 0; i < array->map.max_entries; i++) { __fd_array_map_delete_elem(map, &i, need_defer); + cond_resched(); + } } static void prog_array_map_seq_show_elem(struct bpf_map *map, void *key, -- 2.34.1
From: Weiming Shi <bestswngs@gmail.com> mainline inclusion from mainline-v7.1-rc1 commit 5828b9e5b272ecff7cf5d345128d3de7324117f7 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9180 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- list_next_entry() never returns NULL -- when the current element is the last entry it wraps to the list head via container_of(). The subsequent NULL check is therefore dead code and get_next_key() never returns -ENOENT for the last element, instead reading storage->key from a bogus pointer that aliases internal map fields and copying the result to userspace. Replace it with list_entry_is_head() so the function correctly returns -ENOENT when there are no more entries. Fixes: de9cbbaadba5 ("bpf: introduce cgroup storage maps") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Reviewed-by: Sun Jian <sun.jian.kdev@gmail.com> Acked-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/20260403132951.43533-2-bestswngs@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Pu Lehui <pulehui@huawei.com> --- kernel/bpf/local_storage.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c index a04f505aefe9..c8b8dcc37ed4 100644 --- a/kernel/bpf/local_storage.c +++ b/kernel/bpf/local_storage.c @@ -259,7 +259,7 @@ static int cgroup_storage_get_next_key(struct bpf_map *_map, void *key, goto enoent; storage = list_next_entry(storage, list_map); - if (!storage) + if (list_entry_is_head(storage, &map->list, list_map)) goto enoent; } else { storage = list_first_entry(&map->list, -- 2.34.1
From: MingTao Huang <mintaohuang@tencent.com> mainline inclusion from mainline-v7.1-rc1 commit a1aa9ef47c299c5bbc30594d3c2f0589edf908e6 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9180 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When a dev-bound-only BPF program (BPF_F_XDP_DEV_BOUND_ONLY) undergoes JIT compilation with constant blinding enabled (bpf_jit_harden >= 2), bpf_jit_blind_constants() clones the program. The original prog is then freed in bpf_jit_prog_release_other(), which updates aux->prog to point to the surviving clone, but fails to update offload->prog. This leaves offload->prog pointing to the freed original program. When the network namespace is subsequently destroyed, cleanup_net() triggers bpf_dev_bound_netdev_unregister(), which iterates ondev->progs and calls __bpf_prog_offload_destroy(offload->prog). Accessing the freed prog causes a page fault: BUG: unable to handle page fault for address: ffffc900085f1038 Workqueue: netns cleanup_net RIP: 0010:__bpf_prog_offload_destroy+0xc/0x80 Call Trace: __bpf_offload_dev_netdev_unregister+0x257/0x350 bpf_dev_bound_netdev_unregister+0x4a/0x90 unregister_netdevice_many_notify+0x2a2/0x660 ... cleanup_net+0x21a/0x320 The test sequence that triggers this reliably is: 1. Set net.core.bpf_jit_harden=2 (echo 2 > /proc/sys/net/core/bpf_jit_harden) 2. Run xdp_metadata selftest, which creates a dev-bound-only XDP program on a veth inside a netns (./test_progs -t xdp_metadata) 3. cleanup_net -> page fault in __bpf_prog_offload_destroy Dev-bound-only programs are unique in that they have an offload structure but go through the normal JIT path instead of bpf_prog_offload_compile(). This means they are subject to constant blinding's prog clone-and-replace, while also having offload->prog that must stay in sync. Fix this by updating offload->prog in bpf_jit_prog_release_other(), alongside the existing aux->prog update. Both are back-pointers to the prog that must be kept in sync when the prog is replaced. Fixes: 2b3486bc2d23 ("bpf: Introduce device-bound XDP programs") Signed-off-by: MingTao Huang <mintaohuang@tencent.com> Link: https://lore.kernel.org/r/tencent_BCF692F45859CCE6C22B7B0B64827947D406@qq.co... Signed-off-by: Alexei Starovoitov <ast@kernel.org> Conflicts: kernel/bpf/core.c [ctx conflicts] Signed-off-by: Pu Lehui <pulehui@huawei.com> --- kernel/bpf/core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index d868824d1293..ff20567bbb78 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -1461,6 +1461,8 @@ void bpf_jit_prog_release_other(struct bpf_prog *fp, struct bpf_prog *fp_other) * know whether fp here is the clone or the original. */ fp->aux->prog = fp; + if (fp->aux->offload) + fp->aux->offload->prog = fp; bpf_prog_clone_free(fp_other); } -- 2.34.1
From: Lang Xu <xulang@uniontech.com> mainline inclusion from mainline-v7.1-rc1 commit 576afddfee8d1108ee299bf10f581593540d1a36 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9180 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- An out-of-bounds read occurs when copying element from a BPF_MAP_TYPE_CGROUP_STORAGE map to another pcpu map with the same value_size that is not rounded up to 8 bytes. The issue happens when: 1. A CGROUP_STORAGE map is created with value_size not aligned to 8 bytes (e.g., 4 bytes) 2. A pcpu map is created with the same value_size (e.g., 4 bytes) 3. Update element in 2 with data in 1 pcpu_init_value assumes that all sources are rounded up to 8 bytes, and invokes copy_map_value_long to make a data copy, However, the assumption doesn't stand since there are some cases where the source may not be rounded up to 8 bytes, e.g., CGROUP_STORAGE, skb->data. the verifier verifies exactly the size that the source claims, not the size rounded up to 8 bytes by kernel, an OOB happens when the source has only 4 bytes while the copy size(4) is rounded up to 8. Fixes: d3bec0138bfb ("bpf: Zero-fill re-used per-cpu map element") Reported-by: Kaiyan Mei <kaiyanm@hust.edu.cn> Closes: https://lore.kernel.org/all/14e6c70c.6c121.19c0399d948.Coremail.kaiyanm@hust... Link: https://lore.kernel.org/r/420FEEDDC768A4BE+20260402074236.2187154-1-xulang@u... Signed-off-by: Lang Xu <xulang@uniontech.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Conflicts: kernel/bpf/hashtab.c [ctx conflicts] Signed-off-by: Pu Lehui <pulehui@huawei.com> --- kernel/bpf/hashtab.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index fd00a3d956df..24b489dfe167 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -991,7 +991,7 @@ static void pcpu_init_value(struct bpf_htab *htab, void __percpu *pptr, for_each_possible_cpu(cpu) { if (cpu == current_cpu) - copy_map_value_long(&htab->map, per_cpu_ptr(pptr, cpu), value); + copy_map_value(&htab->map, per_cpu_ptr(pptr, cpu), value); else /* Since elem is preallocated, we cannot touch special fields */ zero_map_value(&htab->map, per_cpu_ptr(pptr, cpu)); } -- 2.34.1
From: Mykyta Yatsenko <yatsenko@meta.com> mainline inclusion from mainline-v7.1-rc1 commit 07738bc566c38e0a8c82084e962890d1d59715c8 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9180 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- When a BPF_F_LOCK update races with a concurrent delete, the freed element can be immediately recycled by alloc_htab_elem(). The fast path in htab_map_update_elem() performs a lockless lookup and then calls copy_map_value_locked() under the element's spin_lock. If alloc_htab_elem() recycles the same memory, it overwrites the value with plain copy_map_value(), without taking the spin_lock, causing torn writes. Use copy_map_value_locked() when BPF_F_LOCK is set so the new element's value is written under the embedded spin_lock, serializing against any stale lock holders. Fixes: 96049f3afd50 ("bpf: introduce BPF_F_LOCK flag") Reported-by: Aaron Esau <aaron1esau@gmail.com> Closes: https://lore.kernel.org/all/CADucPGRvSRpkneb94dPP08YkOHgNgBnskTK6myUag_Mkjim... Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20260401-bpf_map_torn_writes-v1-1-782d071c55e7@met... Signed-off-by: Alexei Starovoitov <ast@kernel.org> Conflicts: kernel/bpf/hashtab.c [Add map_flags for alloc_htab_elem and use htab_elem_value implementation to avoid compiling error] Signed-off-by: Pu Lehui <pulehui@huawei.com> --- kernel/bpf/hashtab.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 24b489dfe167..6363d0565e11 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -1009,7 +1009,7 @@ static bool fd_htab_map_needs_adjust(const struct bpf_htab *htab) static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, void *value, u32 key_size, u32 hash, bool percpu, bool onallcpus, - struct htab_elem *old_elem) + struct htab_elem *old_elem, u64 map_flags) { u32 size = htab->map.value_size; bool prealloc = htab_is_prealloc(htab); @@ -1073,6 +1073,10 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key, } else if (fd_htab_map_needs_adjust(htab)) { size = round_up(size, 8); memcpy(l_new->key + round_up(key_size, 8), value, size); + } else if (map_flags & BPF_F_LOCK) { + copy_map_value_locked(&htab->map, + l_new->key + round_up(key_size, 8), + value, false); } else { copy_map_value(&htab->map, l_new->key + round_up(key_size, 8), @@ -1174,7 +1178,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value, } l_new = alloc_htab_elem(htab, key, value, key_size, hash, false, false, - l_old); + l_old, map_flags); if (IS_ERR(l_new)) { /* all pre-allocated elements are in use or memory exhausted */ ret = PTR_ERR(l_new); @@ -1330,7 +1334,7 @@ static long __htab_percpu_map_update_elem(struct bpf_map *map, void *key, value, onallcpus); } else { l_new = alloc_htab_elem(htab, key, value, key_size, - hash, true, onallcpus, NULL); + hash, true, onallcpus, NULL, map_flags); if (IS_ERR(l_new)) { ret = PTR_ERR(l_new); goto err; -- 2.34.1
From: David Carlier <devnexen@gmail.com> mainline inclusion from mainline-v7.1-rc1 commit 8ed82f807bb09d2c8455aaa665f2c6cb17bc6a19 category: bugfix bugzilla: https://atomgit.com/openeuler/kernel/issues/9180 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... -------------------------------- The DEVMAP_HASH branch in dev_map_redirect_multi() uses hlist_for_each_entry_safe() to iterate hash buckets, but this function runs under RCU protection (called from xdp_do_generic_redirect_map() in softirq context). Concurrent writers (__dev_map_hash_update_elem, dev_map_hash_delete_elem) modify the list using RCU primitives (hlist_add_head_rcu, hlist_del_rcu). hlist_for_each_entry_safe() performs plain pointer dereferences without rcu_dereference(), missing the acquire barrier needed to pair with writers' rcu_assign_pointer(). On weakly-ordered architectures (ARM64, POWER), a reader can observe a partially-constructed node. It also defeats CONFIG_PROVE_RCU lockdep validation and KCSAN data-race detection. Replace with hlist_for_each_entry_rcu() using rcu_read_lock_bh_held() as the lockdep condition, consistent with the rcu_dereference_check() used in the DEVMAP (non-hash) branch of the same functions. Also fix the same incorrect lockdep_is_held(&dtab->index_lock) condition in dev_map_enqueue_multi(), where the lock is not held either. Fixes: e624d4ed4aa8 ("xdp: Extend xdp_redirect_map with broadcast support") Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260320072645.16731-1-devnexen@gmail.com Conflicts: kernel/bpf/devmap.c [ctx conflicts] Signed-off-by: Pu Lehui <pulehui@huawei.com> --- kernel/bpf/devmap.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c index 3bdec239be61..b3ce546c0390 100644 --- a/kernel/bpf/devmap.c +++ b/kernel/bpf/devmap.c @@ -642,7 +642,7 @@ int dev_map_enqueue_multi(struct xdp_frame *xdpf, struct net_device *dev_rx, for (i = 0; i < dtab->n_buckets; i++) { head = dev_map_index_hash(dtab, i); hlist_for_each_entry_rcu(dst, head, index_hlist, - lockdep_is_held(&dtab->index_lock)) { + rcu_read_lock_bh_held()) { if (!is_valid_dst(dst, xdpf)) continue; @@ -724,7 +724,6 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb, struct bpf_dtab_netdev *dst, *last_dst = NULL; int excluded_devices[1+MAX_NEST_DEV]; struct hlist_head *head; - struct hlist_node *next; int num_excluded = 0; unsigned int i; int err; @@ -764,7 +763,7 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb, } else { /* BPF_MAP_TYPE_DEVMAP_HASH */ for (i = 0; i < dtab->n_buckets; i++) { head = dev_map_index_hash(dtab, i); - hlist_for_each_entry_safe(dst, next, head, index_hlist) { + hlist_for_each_entry_rcu(dst, head, index_hlist, rcu_read_lock_bh_held()) { if (!dst) continue; -- 2.34.1
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,转换为PR失败! 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/GJJ... 失败原因:调用atomgit api创建PR失败, 失败原因如下: Another open merge request already exists for this source branch: !22756 建议解决方法:请稍等,机器人会在下一次任务重新执行 FeedBack: The patch(es) which you have sent to kernel@openeuler.org has been converted to PR failed! Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/GJJ... Failed Reason: create PR failed when call atomgit's api, failed reason is as follows: Another open merge request already exists for this source branch: !22756 Suggest Solution: please wait, the bot will retry in the next interval
反馈: 您发送到kernel@openeuler.org的补丁/补丁集,已成功转换为PR! PR链接地址: https://atomgit.com/openeuler/kernel/merge_requests/22758 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/GJJ... FeedBack: The patch(es) which you have sent to kernel@openeuler.org mailing list has been converted to a pull request successfully! Pull request link: https://atomgit.com/openeuler/kernel/merge_requests/22758 Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/GJJ...
participants (2)
-
patchwork bot -
Pu Lehui