From: Jens Axboe <axboe(a)kernel.dk>
mainline inclusion
from mainline-v6.10-rc2
commit 7eb75ce7527129d7f1fee6951566af409a37a1c4
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBEAP3
CVE: CVE-2024-56584
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
--------------------------------
syzbot triggered the following WARN_ON:
WARNING: CPU: 0 PID: 16 at io_uring/tctx.c:51 __io_uring_free+0xfa/0x140 io_uring/tctx.c:51
which is the
WARN_ON_ONCE(!xa_empty(&tctx->xa));
sanity check in __io_uring_free() when a io_uring_task is going through
its final put. The syzbot test case includes injecting memory allocation
failures, and it very much looks like xa_store() can fail one of its
memory allocations and end up with ->head being non-NULL even though no
entries exist in the xarray.
Until this issue gets sorted out, work around it by attempting to
iterate entries in our xarray, and WARN_ON_ONCE() if one is found.
Reported-by: syzbot+cc36d44ec9f368e443d3(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/io-uring/673c1643.050a0220.87769.0066.GAE@google.co…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Long Li <leo.lilong(a)huawei.com>
---
io_uring/tctx.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/io_uring/tctx.c b/io_uring/tctx.c
index c043fe93a3f2..84f6a8385720 100644
--- a/io_uring/tctx.c
+++ b/io_uring/tctx.c
@@ -47,8 +47,19 @@ static struct io_wq *io_init_wq_offload(struct io_ring_ctx *ctx,
void __io_uring_free(struct task_struct *tsk)
{
struct io_uring_task *tctx = tsk->io_uring;
+ struct io_tctx_node *node;
+ unsigned long index;
- WARN_ON_ONCE(!xa_empty(&tctx->xa));
+ /*
+ * Fault injection forcing allocation errors in the xa_store() path
+ * can lead to xa_empty() returning false, even though no actual
+ * node is stored in the xarray. Until that gets sorted out, attempt
+ * an iteration here and warn if any entries are found.
+ */
+ xa_for_each(&tctx->xa, index, node) {
+ WARN_ON_ONCE(1);
+ break;
+ }
WARN_ON_ONCE(tctx->io_wq);
WARN_ON_ONCE(tctx->cached_refs);
--
2.39.2
From: Jens Axboe <axboe(a)kernel.dk>
mainline inclusion
from mainline-v6.10-rc2
commit 7eb75ce7527129d7f1fee6951566af409a37a1c4
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBEAP3
CVE: CVE-2024-56584
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
--------------------------------
syzbot triggered the following WARN_ON:
WARNING: CPU: 0 PID: 16 at io_uring/tctx.c:51 __io_uring_free+0xfa/0x140 io_uring/tctx.c:51
which is the
WARN_ON_ONCE(!xa_empty(&tctx->xa));
sanity check in __io_uring_free() when a io_uring_task is going through
its final put. The syzbot test case includes injecting memory allocation
failures, and it very much looks like xa_store() can fail one of its
memory allocations and end up with ->head being non-NULL even though no
entries exist in the xarray.
Until this issue gets sorted out, work around it by attempting to
iterate entries in our xarray, and WARN_ON_ONCE() if one is found.
Reported-by: syzbot+cc36d44ec9f368e443d3(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/io-uring/673c1643.050a0220.87769.0066.GAE@google.co…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Conflicts:
io_uring/io_uring.c
io_uring/tctx.c
[Conflicts due to __io_uring_free() move to io_uring/tctx.c in mainline]
Signed-off-by: Long Li <leo.lilong(a)huawei.com>
---
io_uring/io_uring.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 4d69fb4cf803..e058ce8bb8e2 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -8341,8 +8341,19 @@ static int io_uring_alloc_task_context(struct task_struct *task,
void __io_uring_free(struct task_struct *tsk)
{
struct io_uring_task *tctx = tsk->io_uring;
+ struct io_tctx_node *node;
+ unsigned long index;
- WARN_ON_ONCE(!xa_empty(&tctx->xa));
+ /*
+ * Fault injection forcing allocation errors in the xa_store() path
+ * can lead to xa_empty() returning false, even though no actual
+ * node is stored in the xarray. Until that gets sorted out, attempt
+ * an iteration here and warn if any entries are found.
+ */
+ xa_for_each(&tctx->xa, index, node) {
+ WARN_ON_ONCE(1);
+ break;
+ }
WARN_ON_ONCE(tctx->io_wq);
WARN_ON_ONCE(tctx->cached_refs);
--
2.39.2
From: Jens Axboe <axboe(a)kernel.dk>
mainline inclusion
from mainline-v6.10-rc2
commit 7eb75ce7527129d7f1fee6951566af409a37a1c4
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBEAP3
CVE: CVE-2024-56584
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
--------------------------------
syzbot triggered the following WARN_ON:
WARNING: CPU: 0 PID: 16 at io_uring/tctx.c:51 __io_uring_free+0xfa/0x140 io_uring/tctx.c:51
which is the
WARN_ON_ONCE(!xa_empty(&tctx->xa));
sanity check in __io_uring_free() when a io_uring_task is going through
its final put. The syzbot test case includes injecting memory allocation
failures, and it very much looks like xa_store() can fail one of its
memory allocations and end up with ->head being non-NULL even though no
entries exist in the xarray.
Until this issue gets sorted out, work around it by attempting to
iterate entries in our xarray, and WARN_ON_ONCE() if one is found.
Reported-by: syzbot+cc36d44ec9f368e443d3(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/io-uring/673c1643.050a0220.87769.0066.GAE@google.co…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Conflicts:
io_uring/io_uring.c
io_uring/tctx.c
[Conflicts due to __io_uring_free() move to io_uring/tctx.c in mainline]
Signed-off-by: Long Li <leo.lilong(a)huawei.com>
---
io_uring/io_uring.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 6e5e00a7692c..170b3dbf0750 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -8537,8 +8537,19 @@ static int io_uring_alloc_task_context(struct task_struct *task,
void __io_uring_free(struct task_struct *tsk)
{
struct io_uring_task *tctx = tsk->io_uring;
+ struct io_tctx_node *node;
+ unsigned long index;
- WARN_ON_ONCE(!xa_empty(&tctx->xa));
+ /*
+ * Fault injection forcing allocation errors in the xa_store() path
+ * can lead to xa_empty() returning false, even though no actual
+ * node is stored in the xarray. Until that gets sorted out, attempt
+ * an iteration here and warn if any entries are found.
+ */
+ xa_for_each(&tctx->xa, index, node) {
+ WARN_ON_ONCE(1);
+ break;
+ }
WARN_ON_ONCE(tctx->io_wq);
WARN_ON_ONCE(tctx->cached_refs);
--
2.39.2
tree: https://gitee.com/openeuler/kernel.git OLK-5.10
head: 7059ffd1a85492c66d87331537ee3e4bdf8bf609
commit: c55fa11d134b40dbe1a4a5512a7fe43497cb6d5e [2620/2620] fscache: limit fscache_object_max_active to avoid blocking
config: x86_64-buildonly-randconfig-004-20250102 (https://download.01.org/0day-ci/archive/20250102/202501021908.g3jjkIa6-lkp@…)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250102/202501021908.g3jjkIa6-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501021908.g3jjkIa6-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from fs/fscache/main.c:16:
In file included from fs/fscache/internal.h:27:
In file included from include/linux/fscache-cache.h:17:
In file included from include/linux/fscache.h:19:
In file included from include/linux/pagemap.h:8:
In file included from include/linux/mm.h:1587:
include/linux/vmstat.h:431:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
431 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
| ~~~~~~~~~~~ ^ ~~~
>> fs/fscache/main.c:51:21: warning: unused variable 'fscache_min_object_max_active' [-Wunused-variable]
51 | static unsigned int fscache_min_object_max_active = FSCACHE_MIN_OBJECT_MAX_ACTIVE;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/fscache/main.c:52:21: warning: unused variable 'fscache_min_op_max_active' [-Wunused-variable]
52 | static unsigned int fscache_min_op_max_active = FSCACHE_MIN_OBJECT_MAX_ACTIVE / 2;
| ^~~~~~~~~~~~~~~~~~~~~~~~~
3 warnings generated.
vim +/fscache_min_object_max_active +51 fs/fscache/main.c
46
47 /* these values serve as lower bounds, will be adjusted in fscache_init() */
48 #define FSCACHE_MIN_OBJECT_MAX_ACTIVE 4
49 static unsigned int fscache_object_max_active = FSCACHE_MIN_OBJECT_MAX_ACTIVE;
50 static unsigned int fscache_op_max_active = FSCACHE_MIN_OBJECT_MAX_ACTIVE / 2;
> 51 static unsigned int fscache_min_object_max_active = FSCACHE_MIN_OBJECT_MAX_ACTIVE;
> 52 static unsigned int fscache_min_op_max_active = FSCACHE_MIN_OBJECT_MAX_ACTIVE / 2;
53
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
From: Hou Tao <houtao1(a)huawei.com>
stable inclusion
from stable-v6.6.66
commit 10e8a2dec9ff1b81de8e892b0850924038adbc6d
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IBEANW
CVE: CVE-2024-56592
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
--------------------------------
For htab of maps, when the map is removed from the htab, it may hold the
last reference of the map. bpf_map_fd_put_ptr() will invoke
bpf_map_free_id() to free the id of the removed map element. However,
bpf_map_fd_put_ptr() is invoked while holding a bucket lock
(raw_spin_lock_t), and bpf_map_free_id() attempts to acquire map_idr_lock
(spinlock_t), triggering the following lockdep warning:
=============================
[ BUG: Invalid wait context ]
6.11.0-rc4+ #49 Not tainted
-----------------------------
test_maps/4881 is trying to lock:
ffffffff84884578 (map_idr_lock){+...}-{3:3}, at: bpf_map_free_id.part.0+0x21/0x70
other info that might help us debug this:
context-{5:5}
2 locks held by test_maps/4881:
#0: ffffffff846caf60 (rcu_read_lock){....}-{1:3}, at: bpf_fd_htab_map_update_elem+0xf9/0x270
#1: ffff888149ced148 (&htab->lockdep_key#2){....}-{2:2}, at: htab_map_update_elem+0x178/0xa80
stack backtrace:
CPU: 0 UID: 0 PID: 4881 Comm: test_maps Not tainted 6.11.0-rc4+ #49
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
Call Trace:
<TASK>
dump_stack_lvl+0x6e/0xb0
dump_stack+0x10/0x20
__lock_acquire+0x73e/0x36c0
lock_acquire+0x182/0x450
_raw_spin_lock_irqsave+0x43/0x70
bpf_map_free_id.part.0+0x21/0x70
bpf_map_put+0xcf/0x110
bpf_map_fd_put_ptr+0x9a/0xb0
free_htab_elem+0x69/0xe0
htab_map_update_elem+0x50f/0xa80
bpf_fd_htab_map_update_elem+0x131/0x270
htab_map_update_elem+0x50f/0xa80
bpf_fd_htab_map_update_elem+0x131/0x270
bpf_map_update_value+0x266/0x380
__sys_bpf+0x21bb/0x36b0
__x64_sys_bpf+0x45/0x60
x64_sys_call+0x1b2a/0x20d0
do_syscall_64+0x5d/0x100
entry_SYSCALL_64_after_hwframe+0x76/0x7e
One way to fix the lockdep warning is using raw_spinlock_t for
map_idr_lock as well. However, bpf_map_alloc_id() invokes
idr_alloc_cyclic() after acquiring map_idr_lock, it will trigger a
similar lockdep warning because the slab's lock (s->cpu_slab->lock) is
still a spinlock.
Instead of changing map_idr_lock's type, fix the issue by invoking
htab_put_fd_value() after htab_unlock_bucket(). However, only deferring
the invocation of htab_put_fd_value() is not enough, because the old map
pointers in htab of maps can not be saved during batched deletion.
Therefore, also defer the invocation of free_htab_elem(), so these
to-be-freed elements could be linked together similar to lru map.
There are four callers for ->map_fd_put_ptr:
(1) alloc_htab_elem() (through htab_put_fd_value())
It invokes ->map_fd_put_ptr() under a raw_spinlock_t. The invocation of
htab_put_fd_value() can not simply move after htab_unlock_bucket(),
because the old element has already been stashed in htab->extra_elems.
It may be reused immediately after htab_unlock_bucket() and the
invocation of htab_put_fd_value() after htab_unlock_bucket() may release
the newly-added element incorrectly. Therefore, saving the map pointer
of the old element for htab of maps before unlocking the bucket and
releasing the map_ptr after unlock. Beside the map pointer in the old
element, should do the same thing for the special fields in the old
element as well.
(2) free_htab_elem() (through htab_put_fd_value())
Its caller includes __htab_map_lookup_and_delete_elem(),
htab_map_delete_elem() and __htab_map_lookup_and_delete_batch().
For htab_map_delete_elem(), simply invoke free_htab_elem() after
htab_unlock_bucket(). For __htab_map_lookup_and_delete_batch(), just
like lru map, linking the to-be-freed element into node_to_free list
and invoking free_htab_elem() for these element after unlock. It is safe
to reuse batch_flink as the link for node_to_free, because these
elements have been removed from the hash llist.
Because htab of maps doesn't support lookup_and_delete operation,
__htab_map_lookup_and_delete_elem() doesn't have the problem, so kept
it as is.
(3) fd_htab_map_free()
It invokes ->map_fd_put_ptr without raw_spinlock_t.
(4) bpf_fd_htab_map_update_elem()
It invokes ->map_fd_put_ptr without raw_spinlock_t.
After moving free_htab_elem() outside htab bucket lock scope, using
pcpu_freelist_push() instead of __pcpu_freelist_push() to disable
the irq before freeing elements, and protecting the invocations of
bpf_mem_cache_free() with migrate_{disable|enable} pair.
Signed-off-by: Hou Tao <houtao1(a)huawei.com>
Link: https://lore.kernel.org/r/20241106063542.357743-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Xiaomeng Zhang <zhangxiaomeng13(a)huawei.com>
---
kernel/bpf/hashtab.c | 56 ++++++++++++++++++++++++++++++--------------
1 file changed, 39 insertions(+), 17 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 7c64ad4f3732..fc34f72702cc 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -892,9 +892,12 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
static void htab_elem_free(struct bpf_htab *htab, struct htab_elem *l)
{
check_and_free_fields(htab, l);
+
+ migrate_disable();
if (htab->map.map_type == BPF_MAP_TYPE_PERCPU_HASH)
bpf_mem_cache_free(&htab->pcpu_ma, l->ptr_to_pptr);
bpf_mem_cache_free(&htab->ma, l);
+ migrate_enable();
}
static void htab_put_fd_value(struct bpf_htab *htab, struct htab_elem *l)
@@ -944,7 +947,7 @@ static void free_htab_elem(struct bpf_htab *htab, struct htab_elem *l)
if (htab_is_prealloc(htab)) {
bpf_map_dec_elem_count(&htab->map);
check_and_free_fields(htab, l);
- __pcpu_freelist_push(&htab->freelist, &l->fnode);
+ pcpu_freelist_push(&htab->freelist, &l->fnode);
} else {
dec_elem_count(htab);
htab_elem_free(htab, l);
@@ -1014,7 +1017,6 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
*/
pl_new = this_cpu_ptr(htab->extra_elems);
l_new = *pl_new;
- htab_put_fd_value(htab, old_elem);
*pl_new = old_elem;
} else {
struct pcpu_freelist_node *l;
@@ -1100,6 +1102,7 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
struct htab_elem *l_new = NULL, *l_old;
struct hlist_nulls_head *head;
unsigned long flags;
+ void *old_map_ptr;
struct bucket *b;
u32 key_size, hash;
int ret;
@@ -1178,12 +1181,27 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
hlist_nulls_add_head_rcu(&l_new->hash_node, head);
if (l_old) {
hlist_nulls_del_rcu(&l_old->hash_node);
+
+ /* l_old has already been stashed in htab->extra_elems, free
+ * its special fields before it is available for reuse. Also
+ * save the old map pointer in htab of maps before unlock
+ * and release it after unlock.
+ */
+ old_map_ptr = NULL;
+ if (htab_is_prealloc(htab)) {
+ if (map->ops->map_fd_put_ptr)
+ old_map_ptr = fd_htab_map_get_ptr(map, l_old);
+ check_and_free_fields(htab, l_old);
+ }
+ }
+ htab_unlock_bucket(htab, b, hash, flags);
+ if (l_old) {
+ if (old_map_ptr)
+ map->ops->map_fd_put_ptr(map, old_map_ptr, true);
if (!htab_is_prealloc(htab))
free_htab_elem(htab, l_old);
- else
- check_and_free_fields(htab, l_old);
}
- ret = 0;
+ return 0;
err:
htab_unlock_bucket(htab, b, hash, flags);
return ret;
@@ -1427,15 +1445,15 @@ static long htab_map_delete_elem(struct bpf_map *map, void *key)
return ret;
l = lookup_elem_raw(head, hash, key, key_size);
-
- if (l) {
+ if (l)
hlist_nulls_del_rcu(&l->hash_node);
- free_htab_elem(htab, l);
- } else {
+ else
ret = -ENOENT;
- }
htab_unlock_bucket(htab, b, hash, flags);
+
+ if (l)
+ free_htab_elem(htab, l);
return ret;
}
@@ -1842,13 +1860,14 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
* may cause deadlock. See comments in function
* prealloc_lru_pop(). Let us do bpf_lru_push_free()
* after releasing the bucket lock.
+ *
+ * For htab of maps, htab_put_fd_value() in
+ * free_htab_elem() may acquire a spinlock with bucket
+ * lock being held and it violates the lock rule, so
+ * invoke free_htab_elem() after unlock as well.
*/
- if (is_lru_map) {
- l->batch_flink = node_to_free;
- node_to_free = l;
- } else {
- free_htab_elem(htab, l);
- }
+ l->batch_flink = node_to_free;
+ node_to_free = l;
}
dst_key += key_size;
dst_val += value_size;
@@ -1860,7 +1879,10 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map,
while (node_to_free) {
l = node_to_free;
node_to_free = node_to_free->batch_flink;
- htab_lru_push_free(htab, l);
+ if (is_lru_map)
+ htab_lru_push_free(htab, l);
+ else
+ free_htab_elem(htab, l);
}
next_batch:
--
2.34.1