From: Haifeng Xu <haifeng.xu(a)shopee.com>
stable inclusion
from stable-v4.19.317
commit c23ead9986a17c793d39be11ce6c084904c9c44a
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/IAJLGS
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
--------------------------------
[ Upstream commit 74751ef5c1912ebd3e65c3b65f45587e05ce5d36 ]
In our production environment, we found many hung tasks which are
blocked for more than 18 hours. Their call traces are like this:
[346278.191038] __schedule+0x2d8/0x890
[346278.191046] schedule+0x4e/0xb0
[346278.191049] perf_event_free_task+0x220/0x270
[346278.191056] ? init_wait_var_entry+0x50/0x50
[346278.191060] copy_process+0x663/0x18d0
[346278.191068] kernel_clone+0x9d/0x3d0
[346278.191072] __do_sys_clone+0x5d/0x80
[346278.191076] __x64_sys_clone+0x25/0x30
[346278.191079] do_syscall_64+0x5c/0xc0
[346278.191083] ? syscall_exit_to_user_mode+0x27/0x50
[346278.191086] ? do_syscall_64+0x69/0xc0
[346278.191088] ? irqentry_exit_to_user_mode+0x9/0x20
[346278.191092] ? irqentry_exit+0x19/0x30
[346278.191095] ? exc_page_fault+0x89/0x160
[346278.191097] ? asm_exc_page_fault+0x8/0x30
[346278.191102] entry_SYSCALL_64_after_hwframe+0x44/0xae
The task was waiting for the refcount become to 1, but from the vmcore,
we found the refcount has already been 1. It seems that the task didn't
get woken up by perf_event_release_kernel() and got stuck forever. The
below scenario may cause the problem.
Thread A Thread B
... ...
perf_event_free_task perf_event_release_kernel
...
acquire event->child_mutex
...
get_ctx
... release event->child_mutex
acquire ctx->mutex
...
perf_free_event (acquire/release event->child_mutex)
...
release ctx->mutex
wait_var_event
acquire ctx->mutex
acquire event->child_mutex
# move existing events to free_list
release event->child_mutex
release ctx->mutex
put_ctx
... ...
In this case, all events of the ctx have been freed, so we couldn't
find the ctx in free_list and Thread A will miss the wakeup. It's thus
necessary to add a wakeup after dropping the reference.
Fixes: 1cf8dfe8a661 ("perf/core: Fix race between close() and fork()")
Signed-off-by: Haifeng Xu <haifeng.xu(a)shopee.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reviewed-by: Frederic Weisbecker <frederic(a)kernel.org>
Acked-by: Mark Rutland <mark.rutland(a)arm.com>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20240513103948.33570-1-haifeng.xu@shopee.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Xiaomeng Zhang <zhangxiaomeng13(a)huawei.com>
---
kernel/events/core.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index b9667223b050..b7f4aaedb5ff 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4664,6 +4664,7 @@ int perf_event_release_kernel(struct perf_event *event)
again:
mutex_lock(&event->child_mutex);
list_for_each_entry(child, &event->child_list, child_list) {
+ void *var = NULL;
/*
* Cannot change, child events are not migrated, see the
@@ -4704,11 +4705,23 @@ int perf_event_release_kernel(struct perf_event *event)
* this can't be the last reference.
*/
put_event(event);
+ } else {
+ var = &ctx->refcount;
}
mutex_unlock(&event->child_mutex);
mutex_unlock(&ctx->mutex);
put_ctx(ctx);
+
+ if (var) {
+ /*
+ * If perf_event_free_task() has deleted all events from the
+ * ctx while the child_mutex got released above, make sure to
+ * notify about the preceding put_ctx().
+ */
+ smp_mb(); /* pairs with wait_var_event() */
+ wake_up_var(var);
+ }
goto again;
}
mutex_unlock(&event->child_mutex);
--
2.34.1
tree: https://gitee.com/openeuler/kernel.git OLK-5.10
head: b7bed6628b750ffd687d1da0a170dece4b0c08bd
commit: 6cf173c15990725cb9c0c4570fbc90e937a757b4 [27098/30000] drivers: initial support for rnpvf drivers from Mucse Technology
config: x86_64-allyesconfig (https://download.01.org/0day-ci/archive/20240903/202409031941.6m1LMpai-lkp@…)
compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240903/202409031941.6m1LMpai-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409031941.6m1LMpai-lkp@intel.com/
All warnings (new ones prefixed by >>):
drivers/net/ethernet/mucse/rnpvf/rnpvf_main.c:112: warning: Function parameter or member 'rnpvf_queue' not described in 'rnpvf_set_ring_vector'
drivers/net/ethernet/mucse/rnpvf/rnpvf_main.c:112: warning: Function parameter or member 'rnpvf_msix_vector' not described in 'rnpvf_set_ring_vector'
>> drivers/net/ethernet/mucse/rnpvf/rnpvf_main.c:112: warning: Excess function parameter 'direction' description in 'rnpvf_set_ring_vector'
>> drivers/net/ethernet/mucse/rnpvf/rnpvf_main.c:112: warning: Excess function parameter 'queue' description in 'rnpvf_set_ring_vector'
>> drivers/net/ethernet/mucse/rnpvf/rnpvf_main.c:112: warning: Excess function parameter 'msix_vector' description in 'rnpvf_set_ring_vector'
drivers/net/ethernet/mucse/rnpvf/rnpvf_main.c:807: warning: Excess function parameter 'rx_ring' description in 'rnpvf_pull_tail'
drivers/net/ethernet/mucse/rnpvf/rnpvf_main.c:1030: warning: Excess function parameter 'skb' description in 'rnpvf_is_non_eop'
drivers/net/ethernet/mucse/rnpvf/rnpvf_main.c:2203: warning: Function parameter or member 'type' not described in 'rnpvf_update_itr'
drivers/net/ethernet/mucse/rnpvf/rnpvf_main.c:3922: warning: Function parameter or member 't' not described in 'rnpvf_watchdog'
drivers/net/ethernet/mucse/rnpvf/rnpvf_main.c:3922: warning: Excess function parameter 'data' description in 'rnpvf_watchdog'
vim +112 drivers/net/ethernet/mucse/rnpvf/rnpvf_main.c
102
103 /**
104 * rnpvf_set_ivar - set IVAR registers - maps interrupt causes to vectors
105 * @adapter: pointer to adapter struct
106 * @direction: 0 for Rx, 1 for Tx, -1 for other causes
107 * @queue: queue to map the corresponding interrupt to
108 * @msix_vector: the vector to map to the corresponding queue
109 */
110 static void rnpvf_set_ring_vector(struct rnpvf_adapter *adapter,
111 u8 rnpvf_queue, u8 rnpvf_msix_vector)
> 112 {
113 struct rnpvf_hw *hw = &adapter->hw;
114 u32 data = 0;
115
116 data = hw->vfnum << 24;
117 data |= (rnpvf_msix_vector << 8);
118 data |= (rnpvf_msix_vector << 0);
119 DPRINTK(IFUP, INFO,
120 "Set Ring-Vector queue:%d (reg:0x%x) <-- Rx-MSIX:%d, Tx-MSIX:%d\n",
121 rnpvf_queue, RING_VECTOR(rnpvf_queue), rnpvf_msix_vector,
122 rnpvf_msix_vector);
123
124 rnpvf_wr_reg(hw->ring_msix_base + RING_VECTOR(rnpvf_queue), data);
125 }
126
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
From: Ido Schimmel <idosch(a)nvidia.com>
stable inclusion
from stable-v5.10.223
commit 36a9996e020dd5aa325e0ecc55eb2328288ea6bb
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/IALEAO
CVE: CVE-2024-43880
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
--------------------------------
[ Upstream commit 97d833ceb27dc19f8777d63f90be4a27b5daeedf ]
ACLs in Spectrum-2 and newer ASICs can reside in the algorithmic TCAM
(A-TCAM) or in the ordinary circuit TCAM (C-TCAM). The former can
contain more ACLs (i.e., tc filters), but the number of masks in each
region (i.e., tc chain) is limited.
In order to mitigate the effects of the above limitation, the device
allows filters to share a single mask if their masks only differ in up
to 8 consecutive bits. For example, dst_ip/25 can be represented using
dst_ip/24 with a delta of 1 bit. The C-TCAM does not have a limit on the
number of masks being used (and therefore does not support mask
aggregation), but can contain a limited number of filters.
The driver uses the "objagg" library to perform the mask aggregation by
passing it objects that consist of the filter's mask and whether the
filter is to be inserted into the A-TCAM or the C-TCAM since filters in
different TCAMs cannot share a mask.
The set of created objects is dependent on the insertion order of the
filters and is not necessarily optimal. Therefore, the driver will
periodically ask the library to compute a more optimal set ("hints") by
looking at all the existing objects.
When the library asks the driver whether two objects can be aggregated
the driver only compares the provided masks and ignores the A-TCAM /
C-TCAM indication. This is the right thing to do since the goal is to
move as many filters as possible to the A-TCAM. The driver also forbids
two identical masks from being aggregated since this can only happen if
one was intentionally put in the C-TCAM to avoid a conflict in the
A-TCAM.
The above can result in the following set of hints:
H1: {mask X, A-TCAM} -> H2: {mask Y, A-TCAM} // X is Y + delta
H3: {mask Y, C-TCAM} -> H4: {mask Z, A-TCAM} // Y is Z + delta
After getting the hints from the library the driver will start migrating
filters from one region to another while consulting the computed hints
and instructing the device to perform a lookup in both regions during
the transition.
Assuming a filter with mask X is being migrated into the A-TCAM in the
new region, the hints lookup will return H1. Since H2 is the parent of
H1, the library will try to find the object associated with it and
create it if necessary in which case another hints lookup (recursive)
will be performed. This hints lookup for {mask Y, A-TCAM} will either
return H2 or H3 since the driver passes the library an object comparison
function that ignores the A-TCAM / C-TCAM indication.
This can eventually lead to nested objects which are not supported by
the library [1].
Fix by removing the object comparison function from both the driver and
the library as the driver was the only user. That way the lookup will
only return exact matches.
I do not have a reliable reproducer that can reproduce the issue in a
timely manner, but before the fix the issue would reproduce in several
minutes and with the fix it does not reproduce in over an hour.
Note that the current usefulness of the hints is limited because they
include the C-TCAM indication and represent aggregation that cannot
actually happen. This will be addressed in net-next.
[1]
WARNING: CPU: 0 PID: 153 at lib/objagg.c:170 objagg_obj_parent_assign+0xb5/0xd0
Modules linked in:
CPU: 0 PID: 153 Comm: kworker/0:18 Not tainted 6.9.0-rc6-custom-g70fbc2c1c38b #42
Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018
Workqueue: mlxsw_core mlxsw_sp_acl_tcam_vregion_rehash_work
RIP: 0010:objagg_obj_parent_assign+0xb5/0xd0
[...]
Call Trace:
<TASK>
__objagg_obj_get+0x2bb/0x580
objagg_obj_get+0xe/0x80
mlxsw_sp_acl_erp_mask_get+0xb5/0xf0
mlxsw_sp_acl_atcam_entry_add+0xe8/0x3c0
mlxsw_sp_acl_tcam_entry_create+0x5e/0xa0
mlxsw_sp_acl_tcam_vchunk_migrate_one+0x16b/0x270
mlxsw_sp_acl_tcam_vregion_rehash_work+0xbe/0x510
process_one_work+0x151/0x370
Fixes: 9069a3817d82 ("lib: objagg: implement optimization hints assembly and use hints for object creation")
Signed-off-by: Ido Schimmel <idosch(a)nvidia.com>
Reviewed-by: Amit Cohen <amcohen(a)nvidia.com>
Tested-by: Alexander Zubkov <green(a)qrator.net>
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Dong Chenchen <dongchenchen2(a)huawei.com>
---
.../ethernet/mellanox/mlxsw/spectrum_acl_erp.c | 13 -------------
include/linux/objagg.h | 1 -
lib/objagg.c | 15 ---------------
3 files changed, 29 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
index d231f4d2888b..9eee229303cc 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c
@@ -1217,18 +1217,6 @@ static bool mlxsw_sp_acl_erp_delta_check(void *priv, const void *parent_obj,
return err ? false : true;
}
-static int mlxsw_sp_acl_erp_hints_obj_cmp(const void *obj1, const void *obj2)
-{
- const struct mlxsw_sp_acl_erp_key *key1 = obj1;
- const struct mlxsw_sp_acl_erp_key *key2 = obj2;
-
- /* For hints purposes, two objects are considered equal
- * in case the masks are the same. Does not matter what
- * the "ctcam" value is.
- */
- return memcmp(key1->mask, key2->mask, sizeof(key1->mask));
-}
-
static void *mlxsw_sp_acl_erp_delta_create(void *priv, void *parent_obj,
void *obj)
{
@@ -1308,7 +1296,6 @@ static void mlxsw_sp_acl_erp_root_destroy(void *priv, void *root_priv)
static const struct objagg_ops mlxsw_sp_acl_erp_objagg_ops = {
.obj_size = sizeof(struct mlxsw_sp_acl_erp_key),
.delta_check = mlxsw_sp_acl_erp_delta_check,
- .hints_obj_cmp = mlxsw_sp_acl_erp_hints_obj_cmp,
.delta_create = mlxsw_sp_acl_erp_delta_create,
.delta_destroy = mlxsw_sp_acl_erp_delta_destroy,
.root_create = mlxsw_sp_acl_erp_root_create,
diff --git a/include/linux/objagg.h b/include/linux/objagg.h
index 78021777df46..6df5b887dc54 100644
--- a/include/linux/objagg.h
+++ b/include/linux/objagg.h
@@ -8,7 +8,6 @@ struct objagg_ops {
size_t obj_size;
bool (*delta_check)(void *priv, const void *parent_obj,
const void *obj);
- int (*hints_obj_cmp)(const void *obj1, const void *obj2);
void * (*delta_create)(void *priv, void *parent_obj, void *obj);
void (*delta_destroy)(void *priv, void *delta_priv);
void * (*root_create)(void *priv, void *obj, unsigned int root_id);
diff --git a/lib/objagg.c b/lib/objagg.c
index 5e1676ccdadd..6917d5974345 100644
--- a/lib/objagg.c
+++ b/lib/objagg.c
@@ -906,20 +906,6 @@ static const struct objagg_opt_algo *objagg_opt_algos[] = {
[OBJAGG_OPT_ALGO_SIMPLE_GREEDY] = &objagg_opt_simple_greedy,
};
-static int objagg_hints_obj_cmp(struct rhashtable_compare_arg *arg,
- const void *obj)
-{
- struct rhashtable *ht = arg->ht;
- struct objagg_hints *objagg_hints =
- container_of(ht, struct objagg_hints, node_ht);
- const struct objagg_ops *ops = objagg_hints->ops;
- const char *ptr = obj;
-
- ptr += ht->p.key_offset;
- return ops->hints_obj_cmp ? ops->hints_obj_cmp(ptr, arg->key) :
- memcmp(ptr, arg->key, ht->p.key_len);
-}
-
/**
* objagg_hints_get - obtains hints instance
* @objagg: objagg instance
@@ -958,7 +944,6 @@ struct objagg_hints *objagg_hints_get(struct objagg *objagg,
offsetof(struct objagg_hints_node, obj);
objagg_hints->ht_params.head_offset =
offsetof(struct objagg_hints_node, ht_node);
- objagg_hints->ht_params.obj_cmpfn = objagg_hints_obj_cmp;
err = rhashtable_init(&objagg_hints->node_ht, &objagg_hints->ht_params);
if (err)
--
2.25.1