[PATCH OLK-6.6 1/2] bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic

From: Andrii Nakryiko <andrii@kernel.org> stable inclusion from stable-v6.6.80 commit fc01ba097319ae2033f118da3075bd783f4a08e9 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBSW0M CVE: CVE-2025-21853 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=... -------------------------------- [ Upstream commit 98671a0fd1f14e4a518ee06b19037c20014900eb ] For all BPF maps we ensure that VM_MAYWRITE is cleared when memory-mapping BPF map contents as initially read-only VMA. This is because in some cases BPF verifier relies on the underlying data to not be modified afterwards by user space, so once something is mapped read-only, it shouldn't be re-mmap'ed as read-write. As such, it's not necessary to check VM_MAYWRITE in bpf_map_mmap() and map->ops->map_mmap() callbacks: VM_WRITE should be consistently set for read-write mappings, and if VM_WRITE is not set, there is no way for user space to upgrade read-only mapping to read-write one. This patch cleans up this VM_WRITE vs VM_MAYWRITE handling within bpf_map_mmap(), which is an entry point for any BPF map mmap()-ing logic. We also drop unnecessary sanitization of VM_MAYWRITE in BPF ringbuf's map_mmap() callback implementation, as it is already performed by common code in bpf_map_mmap(). Note, though, that in bpf_map_mmap_{open,close}() callbacks we can't drop VM_MAYWRITE use, because it's possible (and is outside of subsystem's control) to have initially read-write memory mapping, which is subsequently dropped to read-only by user space through mprotect(). In such case, from BPF verifier POV it's read-write data throughout the lifetime of BPF map, and is counted as "active writer". But its VMAs will start out as VM_WRITE|VM_MAYWRITE, then mprotect() can change it to just VM_MAYWRITE (and no VM_WRITE), so when its finally munmap()'ed and bpf_map_mmap_close() is called, vm_flags will be just VM_MAYWRITE, but we still need to decrement active writer count with bpf_map_write_active_dec() as it's still considered to be a read-write mapping by the rest of BPF subsystem. Similar reasoning applies to bpf_map_mmap_open(), which is called whenever mmap(), munmap(), and/or mprotect() forces mm subsystem to split original VMA into multiple discontiguous VMAs. Memory-mapping handling is a bit tricky, yes. Cc: Jann Horn <jannh@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20250129012246.1515826-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Stable-dep-of: bc27c52eea18 ("bpf: avoid holding freeze_mutex during mmap operation") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Pu Lehui <pulehui@huawei.com> --- kernel/bpf/ringbuf.c | 4 ---- kernel/bpf/syscall.c | 10 ++++++++-- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c index 00688545742f..6aff5ee483b6 100644 --- a/kernel/bpf/ringbuf.c +++ b/kernel/bpf/ringbuf.c @@ -268,8 +268,6 @@ static int ringbuf_map_mmap_kern(struct bpf_map *map, struct vm_area_struct *vma /* allow writable mapping for the consumer_pos only */ if (vma->vm_pgoff != 0 || vma->vm_end - vma->vm_start != PAGE_SIZE) return -EPERM; - } else { - vm_flags_clear(vma, VM_MAYWRITE); } /* remap_vmalloc_range() checks size and offset constraints */ return remap_vmalloc_range(vma, rb_map->rb, @@ -289,8 +287,6 @@ static int ringbuf_map_mmap_user(struct bpf_map *map, struct vm_area_struct *vma * position, and the ring buffer data itself. */ return -EPERM; - } else { - vm_flags_clear(vma, VM_MAYWRITE); } /* remap_vmalloc_range() checks size and offset constraints */ return remap_vmalloc_range(vma, rb_map->rb, vma->vm_pgoff + RINGBUF_PGOFF); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index f8edda3bf396..f79b8915d8c8 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -913,15 +913,21 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) vma->vm_ops = &bpf_map_default_vmops; vma->vm_private_data = map; vm_flags_clear(vma, VM_MAYEXEC); + /* If mapping is read-only, then disallow potentially re-mapping with + * PROT_WRITE by dropping VM_MAYWRITE flag. This VM_MAYWRITE clearing + * means that as far as BPF map's memory-mapped VMAs are concerned, + * VM_WRITE and VM_MAYWRITE and equivalent, if one of them is set, + * both should be set, so we can forget about VM_MAYWRITE and always + * check just VM_WRITE + */ if (!(vma->vm_flags & VM_WRITE)) - /* disallow re-mapping with PROT_WRITE */ vm_flags_clear(vma, VM_MAYWRITE); err = map->ops->map_mmap(map, vma); if (err) goto out; - if (vma->vm_flags & VM_MAYWRITE) + if (vma->vm_flags & VM_WRITE) bpf_map_write_active_inc(map); out: mutex_unlock(&map->freeze_mutex); -- 2.34.1

From: Andrii Nakryiko <andrii@kernel.org> stable inclusion from stable-v6.6.80 commit 29cfda62ab4d92ab94123813db49ab76c1e61b29 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBSW0M CVE: CVE-2025-21853 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=... -------------------------------- [ Upstream commit bc27c52eea189e8f7492d40739b7746d67b65beb ] We use map->freeze_mutex to prevent races between map_freeze() and memory mapping BPF map contents with writable permissions. The way we naively do this means we'll hold freeze_mutex for entire duration of all the mm and VMA manipulations, which is completely unnecessary. This can potentially also lead to deadlocks, as reported by syzbot in [0]. So, instead, hold freeze_mutex only during writeability checks, bump (proactively) "write active" count for the map, unlock the mutex and proceed with mmap logic. And only if something went wrong during mmap logic, then undo that "write active" counter increment. [0] https://lore.kernel.org/bpf/678dcbc9.050a0220.303755.0066.GAE@google.com/ Fixes: fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY") Reported-by: syzbot+4dc041c686b7c816a71e@syzkaller.appspotmail.com Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20250129012246.1515826-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Pu Lehui <pulehui@huawei.com> --- kernel/bpf/syscall.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index f79b8915d8c8..35366f061c3d 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -883,7 +883,7 @@ static const struct vm_operations_struct bpf_map_default_vmops = { static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) { struct bpf_map *map = filp->private_data; - int err; + int err = 0; if (!map->ops->map_mmap || !IS_ERR_OR_NULL(map->record)) return -ENOTSUPP; @@ -907,7 +907,12 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) err = -EACCES; goto out; } + bpf_map_write_active_inc(map); } +out: + mutex_unlock(&map->freeze_mutex); + if (err) + return err; /* set default open/close callbacks */ vma->vm_ops = &bpf_map_default_vmops; @@ -924,13 +929,11 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma) vm_flags_clear(vma, VM_MAYWRITE); err = map->ops->map_mmap(map, vma); - if (err) - goto out; + if (err) { + if (vma->vm_flags & VM_WRITE) + bpf_map_write_active_dec(map); + } - if (vma->vm_flags & VM_WRITE) - bpf_map_write_active_inc(map); -out: - mutex_unlock(&map->freeze_mutex); return err; } -- 2.34.1

反馈: 您发送到kernel@openeuler.org的补丁/补丁集,转换为PR失败! 邮件列表地址:https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/M4D... 失败原因:补丁集缺失封面信息 建议解决方法:请提供补丁集并重新发送您的补丁集到邮件列表 FeedBack: The patch(es) which you have sent to kernel@openeuler.org has been converted to PR failed! Mailing list address: https://mailweb.openeuler.org/archives/list/kernel@openeuler.org/message/M4D... Failed Reason: the cover of the patches is missing Suggest Solution: please checkout and apply the patches' cover and send all again
participants (2)
-
patchwork bot
-
Pu Lehui