From: Thinh Tran <thinhtr(a)linux.vnet.ibm.com>
stable inclusion
from stable-v5.10.209
commit 1059aa41c5a84abfab4cc7371d6b5ff2b30b6c2d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I9J6AL
CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=…
-------------------------
[ Upstream commit 16b55b1f2269962fb6b5154b8bf43f37c9a96637 ]
When an EEH error is encountered by a PCI adapter, the EEH driver
modifies the PCI channel's state as shown below:
enum {
/* I/O channel is in normal state */
pci_channel_io_normal = (__force pci_channel_state_t) 1,
/* I/O to channel is blocked */
pci_channel_io_frozen = (__force pci_channel_state_t) 2,
/* PCI card is dead */
pci_channel_io_perm_failure = (__force pci_channel_state_t) 3,
};
If the same EEH error then causes the tg3 driver's transmit timeout
logic to execute, the tg3_tx_timeout() function schedules a reset
task via tg3_reset_task_schedule(), which may cause a race condition
between the tg3 and EEH driver as both attempt to recover the HW via
a reset action.
EEH driver gets error event
--> eeh_set_channel_state()
and set device to one of
error state above scheduler: tg3_reset_task() get
returned error from tg3_init_hw()
--> dev_close() shuts down the interface
tg3_io_slot_reset() and
tg3_io_resume() fail to
reset/resume the device
To resolve this issue, we avoid the race condition by checking the PCI
channel state in the tg3_reset_task() function and skip the tg3 driver
initiated reset when the PCI channel is not in the normal state. (The
driver has no access to tg3 device registers at this point and cannot
even complete the reset task successfully without external assistance.)
We'll leave the reset procedure to be managed by the EEH driver which
calls the tg3_io_error_detected(), tg3_io_slot_reset() and
tg3_io_resume() functions as appropriate.
Adding the same checking in tg3_dump_state() to avoid dumping all
device registers when the PCI channel is not in the normal state.
Signed-off-by: Thinh Tran <thinhtr(a)linux.vnet.ibm.com>
Tested-by: Venkata Sai Duggi <venkata.sai.duggi(a)ibm.com>
Reviewed-by: David Christensen <drc(a)linux.vnet.ibm.com>
Reviewed-by: Michael Chan <michael.chan(a)broadcom.com>
Link: https://lore.kernel.org/r/20231201001911.656-1-thinhtr@linux.vnet.ibm.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Baogen Shang <baogen.shang(a)windriver.com>
---
drivers/net/ethernet/broadcom/tg3.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
index 4e74a3d44d1e..56ca913f0c2d 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -6454,6 +6454,14 @@ static void tg3_dump_state(struct tg3 *tp)
int i;
u32 *regs;
+ /* If it is a PCI error, all registers will be 0xffff,
+ * we don't dump them out, just report the error and return
+ */
+ if (tp->pdev->error_state != pci_channel_io_normal) {
+ netdev_err(tp->dev, "PCI channel ERROR!\n");
+ return;
+ }
+
regs = kzalloc(TG3_REG_BLK_SIZE, GFP_ATOMIC);
if (!regs)
return;
@@ -11195,7 +11203,8 @@ static void tg3_reset_task(struct work_struct *work)
rtnl_lock();
tg3_full_lock(tp, 0);
- if (tp->pcierr_recovery || !netif_running(tp->dev)) {
+ if (tp->pcierr_recovery || !netif_running(tp->dev) ||
+ tp->pdev->error_state != pci_channel_io_normal) {
tg3_flag_clear(tp, RESET_TASK_PENDING);
tg3_full_unlock(tp);
rtnl_unlock();
--
2.33.0
From: Zhipeng Lu <alexious(a)zju.edu.cn>
stable inclusion
from stable-v5.10.209
commit aeed2b4e4a70c7568d4a5eecd6a109713c0dfbf4
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I9J6AL
CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=…
-------------------------
[ Upstream commit ac16667237a82e2597e329eb9bc520d1cf9dff30 ]
When the allocation of
adev->pm.dpm.dyn_state.vddc_dependency_on_dispclk.entries fails,
amdgpu_free_extended_power_table is called to free some fields of adev.
However, when the control flow returns to si_dpm_sw_init, it goes to
label dpm_failed and calls si_dpm_fini, which calls
amdgpu_free_extended_power_table again and free those fields again. Thus
a double-free is triggered.
Fixes: 841686df9f7d ("drm/amdgpu: add SI DPM support (v4)")
Signed-off-by: Zhipeng Lu <alexious(a)zju.edu.cn>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Baogen Shang <baogen.shang(a)windriver.com>
---
drivers/gpu/drm/amd/pm/powerplay/si_dpm.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/pm/powerplay/si_dpm.c b/drivers/gpu/drm/amd/pm/powerplay/si_dpm.c
index d6544a6dabc7..6f0653c81f8f 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/si_dpm.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/si_dpm.c
@@ -7349,10 +7349,9 @@ static int si_dpm_init(struct amdgpu_device *adev)
kcalloc(4,
sizeof(struct amdgpu_clock_voltage_dependency_entry),
GFP_KERNEL);
- if (!adev->pm.dpm.dyn_state.vddc_dependency_on_dispclk.entries) {
- amdgpu_free_extended_power_table(adev);
+ if (!adev->pm.dpm.dyn_state.vddc_dependency_on_dispclk.entries)
return -ENOMEM;
- }
+
adev->pm.dpm.dyn_state.vddc_dependency_on_dispclk.count = 4;
adev->pm.dpm.dyn_state.vddc_dependency_on_dispclk.entries[0].clk = 0;
adev->pm.dpm.dyn_state.vddc_dependency_on_dispclk.entries[0].v = 0;
--
2.33.0
hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I9OJK9
CVE: NA
----------------------------------------
Here are many scenarios we tested with smart_grid, we found that the
first domain level is key to the benchmark.
The reason is that there are many things such as interrupt affinity,
memory affinity factor that can have a big impact on the test.
Before this patch, the first domain level is unchangeable after
creation.
This patch introduce the 'cpu.rebuild_affinity_domain' to dynamically
reconfigure all domain levels.
Typical use cases:
echo $cpu_id > cpu.rebuild_affinity_domain
The cpu_id means which cpu we want to set first level.
If we set cpu_id = 34, we can see some change like:
---------------- -----------------
| level 0 (0-31) | | level 0 (32-63) |
---------------- -----------------
v v
------------------- ------------------
| level 1 (0-63) | | level 1 (0-63) |
------------------- ------------------
v --> v
--------------------- --------------------
| level 2 (0-95) | | level 2 (0-95) |
--------------------- --------------------
v v
------------------------ ----------------------
| level 3 (0-127) | | level 3 (0-127) |
------------------------ ----------------------
There are number of constraints on the rebuild feature:
1. Only rebuild domain while auto mode disabled.
(cpu.dynamic_affinity_mode == 1)
2. Only rebuild on active and housekeeping cpu.
(Offline and isolate CPUs are forbidden)
3. This file is write only.
Signed-off-by: Yipeng Zou <zouyipeng(a)huawei.com>
---
kernel/sched/core.c | 13 +++++++++++++
kernel/sched/fair.c | 43 +++++++++++++++++++++++++++++++++++++++++++
kernel/sched/sched.h | 1 +
3 files changed, 57 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fa71c7c51196..77dc6e0e3f8b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9690,6 +9690,15 @@ static int cpu_affinity_stat_show(struct seq_file *sf, void *v)
return 0;
}
+
+static int cpu_rebuild_affinity_domain_u64(struct cgroup_subsys_state *css,
+ struct cftype *cftype,
+ u64 cpu)
+{
+ struct task_group *tg = css_tg(css);
+
+ return tg_rebuild_affinity_domains(cpu, tg->auto_affinity);
+}
#endif /* CONFIG_QOS_SCHED_SMART_GRID */
#ifdef CONFIG_QOS_SCHED
@@ -9873,6 +9882,10 @@ static struct cftype cpu_legacy_files[] = {
.name = "affinity_stat",
.seq_show = cpu_affinity_stat_show,
},
+ {
+ .name = "rebuild_affinity_domain",
+ .write_u64 = cpu_rebuild_affinity_domain_u64,
+ },
#endif
#ifdef CONFIG_CFS_BANDWIDTH
{
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f39e7547523c..1458878f5464 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6242,6 +6242,49 @@ static void destroy_auto_affinity(struct task_group *tg)
kfree(tg->auto_affinity);
tg->auto_affinity = NULL;
}
+
+int tg_rebuild_affinity_domains(int cpu, struct auto_affinity *auto_affi)
+{
+ int ret = 0;
+ int level = 0;
+ struct sched_domain *tmp;
+
+ if (unlikely(!auto_affi))
+ return -EPERM;
+
+ mutex_lock(&smart_grid_used_mutex);
+ raw_spin_lock_irq(&auto_affi->lock);
+ /* Only build domain while auto mode disabled */
+ if (auto_affi->mode) {
+ ret = -EPERM;
+ goto unlock_all;
+ }
+
+ /* Only build on active and housekeeping cpu */
+ if (!cpu_active(cpu) || !housekeeping_cpu(cpu, HK_FLAG_DOMAIN)) {
+ ret = -EINVAL;
+ goto unlock_all;
+ }
+
+ for_each_domain(cpu, tmp) {
+ if (!auto_affi->ad.domains[level] || !auto_affi->ad.domains_orig[level])
+ continue;
+
+ /* rebuild domain[,_orig] and reset schedstat counter */
+ cpumask_copy(auto_affi->ad.domains[level], sched_domain_span(tmp));
+ cpumask_copy(auto_affi->ad.domains_orig[level], auto_affi->ad.domains[level]);
+ __schedstat_set(auto_affi->ad.stay_cnt[level], 0);
+ level++;
+ }
+
+ /* trigger to update smart grid zone */
+ sched_grid_zone_update(false);
+
+unlock_all:
+ raw_spin_unlock_irq(&auto_affi->lock);
+ mutex_unlock(&smart_grid_used_mutex);
+ return ret;
+}
#else
static void destroy_auto_affinity(struct task_group *tg) {}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e6f934af7062..e10f65a7f87f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -580,6 +580,7 @@ extern void start_auto_affinity(struct auto_affinity *auto_affi);
extern void stop_auto_affinity(struct auto_affinity *auto_affi);
extern int init_auto_affinity(struct task_group *tg);
extern void tg_update_affinity_domains(int cpu, int online);
+extern int tg_rebuild_affinity_domains(int cpu, struct auto_affinity *auto_affi);
#else
static inline int init_auto_affinity(struct task_group *tg)
--
2.34.1