June 2024 - Kernel - mailweb.openeuler.org

[PATCH OLK-6.6] memcg: attach memcg async reclaim worker to curcpu
by Lu Jialin 19 Jun '24

19 Jun '24

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9PXW6 -------------------------------- Attach memcg async relcaim worker to a curcpu, which will make sure memcg async reclaim worker will be scheduled among the cpumask belong to the current's cpuset. Signed-off-by: Lu Jialin <lujialin4(a)huawei.com> --- mm/memcontrol.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c7958805ee66..d97410e3ec0e 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2926,7 +2926,13 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, #ifdef CONFIG_MEMCG_V1_RECLAIM if (is_high_async_reclaim(memcg) && !mem_high) { WRITE_ONCE(memcg->high_async_reclaim, true); - schedule_work(&memcg->high_work); +#ifdef CONFIG_MEMCG_SWAP_QOS + if (static_branch_likely(&memcg_swap_qos_key)) + schedule_work_on(smp_processor_id(), + &memcg->high_work); + else +#endif + schedule_work(&memcg->high_work); break; } #endif -- 2.34.1

2 1

[PATCH openEuler-22.03-LTS-SP1] drm/amd/display: Fix division by zero in setup_dsc_config
by Jinjiang Tu 19 Jun '24

19 Jun '24

From: Jose Fernandez <josef(a)netflix.com> stable inclusion from stable-v5.15.160 commit a32c8f951c8a456c1c251e1dcdf21787f8066445 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IA3UT7 CVE: CVE-2024-36969 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- commit 130afc8a886183a94cf6eab7d24f300014ff87ba upstream. When slice_height is 0, the division by slice_height in the calculation of the number of slices will cause a division by zero driver crash. This leaves the kernel in a state that requires a reboot. This patch adds a check to avoid the division by zero. The stack trace below is for the 6.8.4 Kernel. I reproduced the issue on a Z16 Gen 2 Lenovo Thinkpad with a Apple Studio Display monitor connected via Thunderbolt. The amdgpu driver crashed with this exception when I rebooted the system with the monitor connected. kernel: ? die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434 arch/x86/kernel/dumpstack.c:447) kernel: ? do_trap (arch/x86/kernel/traps.c:113 arch/x86/kernel/traps.c:154) kernel: ? setup_dsc_config (drivers/gpu/drm/amd/amdgpu/../display/dc/dsc/dc_dsc.c:1053) amdgpu kernel: ? do_error_trap (./arch/x86/include/asm/traps.h:58 arch/x86/kernel/traps.c:175) kernel: ? setup_dsc_config (drivers/gpu/drm/amd/amdgpu/../display/dc/dsc/dc_dsc.c:1053) amdgpu kernel: ? exc_divide_error (arch/x86/kernel/traps.c:194 (discriminator 2)) kernel: ? setup_dsc_config (drivers/gpu/drm/amd/amdgpu/../display/dc/dsc/dc_dsc.c:1053) amdgpu kernel: ? asm_exc_divide_error (./arch/x86/include/asm/idtentry.h:548) kernel: ? setup_dsc_config (drivers/gpu/drm/amd/amdgpu/../display/dc/dsc/dc_dsc.c:1053) amdgpu kernel: dc_dsc_compute_config (drivers/gpu/drm/amd/amdgpu/../display/dc/dsc/dc_dsc.c:1109) amdgpu After applying this patch, the driver no longer crashes when the monitor is connected and the system is rebooted. I believe this is the same issue reported for 3113. Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com> Signed-off-by: Jose Fernandez <josef(a)netflix.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3113 Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> Cc: "Limonciello, Mario" <mario.limonciello(a)amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Conflicts: drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang(a)huawei.com> --- drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c b/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c index 4c844cfaa956..cbbcc34ac58b 100644 --- a/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c +++ b/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c @@ -735,7 +735,12 @@ static bool setup_dsc_config( if (!is_dsc_possible) goto done; - dsc_cfg->num_slices_v = pic_height/slice_height; + if (slice_height > 0) { + dsc_cfg->num_slices_v = pic_height / slice_height; + } else { + is_dsc_possible = false; + goto done; + } // Final decission: can we do DSC or not? if (is_dsc_possible) { -- 2.25.1

2 1

[PATCH OLK-5.10] drm/amd/display: Fix division by zero in setup_dsc_config
by Jinjiang Tu 19 Jun '24

19 Jun '24

From: Jose Fernandez <josef(a)netflix.com> stable inclusion from stable-v5.15.160 commit a32c8f951c8a456c1c251e1dcdf21787f8066445 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/IA3UT7 CVE: CVE-2024-36969 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- commit 130afc8a886183a94cf6eab7d24f300014ff87ba upstream. When slice_height is 0, the division by slice_height in the calculation of the number of slices will cause a division by zero driver crash. This leaves the kernel in a state that requires a reboot. This patch adds a check to avoid the division by zero. The stack trace below is for the 6.8.4 Kernel. I reproduced the issue on a Z16 Gen 2 Lenovo Thinkpad with a Apple Studio Display monitor connected via Thunderbolt. The amdgpu driver crashed with this exception when I rebooted the system with the monitor connected. kernel: ? die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434 arch/x86/kernel/dumpstack.c:447) kernel: ? do_trap (arch/x86/kernel/traps.c:113 arch/x86/kernel/traps.c:154) kernel: ? setup_dsc_config (drivers/gpu/drm/amd/amdgpu/../display/dc/dsc/dc_dsc.c:1053) amdgpu kernel: ? do_error_trap (./arch/x86/include/asm/traps.h:58 arch/x86/kernel/traps.c:175) kernel: ? setup_dsc_config (drivers/gpu/drm/amd/amdgpu/../display/dc/dsc/dc_dsc.c:1053) amdgpu kernel: ? exc_divide_error (arch/x86/kernel/traps.c:194 (discriminator 2)) kernel: ? setup_dsc_config (drivers/gpu/drm/amd/amdgpu/../display/dc/dsc/dc_dsc.c:1053) amdgpu kernel: ? asm_exc_divide_error (./arch/x86/include/asm/idtentry.h:548) kernel: ? setup_dsc_config (drivers/gpu/drm/amd/amdgpu/../display/dc/dsc/dc_dsc.c:1053) amdgpu kernel: dc_dsc_compute_config (drivers/gpu/drm/amd/amdgpu/../display/dc/dsc/dc_dsc.c:1109) amdgpu After applying this patch, the driver no longer crashes when the monitor is connected and the system is rebooted. I believe this is the same issue reported for 3113. Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com> Signed-off-by: Jose Fernandez <josef(a)netflix.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3113 Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> Cc: "Limonciello, Mario" <mario.limonciello(a)amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Conflicts: drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c [Context conflicts.] Signed-off-by: Jinjiang Tu <tujinjiang(a)huawei.com> --- drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c b/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c index 4c844cfaa956..cbbcc34ac58b 100644 --- a/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c +++ b/drivers/gpu/drm/amd/display/dc/dsc/dc_dsc.c @@ -735,7 +735,12 @@ static bool setup_dsc_config( if (!is_dsc_possible) goto done; - dsc_cfg->num_slices_v = pic_height/slice_height; + if (slice_height > 0) { + dsc_cfg->num_slices_v = pic_height / slice_height; + } else { + is_dsc_possible = false; + goto done; + } // Final decission: can we do DSC or not? if (is_dsc_possible) { -- 2.25.1

2 1

[PATCH OLK-5.10] memcg: attach memcg async reclaim worker to curcpu
by Lu Jialin 19 Jun '24

19 Jun '24

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9PXW6 -------------------------------- Attach memcg async relcaim worker to a curcpu, which will make sure memcg async reclaim worker will be scheduled among the cpumask belong to the current's cpuset. Signed-off-by: Lu Jialin <lujialin4(a)huawei.com> --- mm/memcontrol.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e6b355332ffb..3c979aa70ff6 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2844,7 +2844,13 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, #ifdef CONFIG_MEMCG_V1_THRESHOLD_QOS if (is_high_async_reclaim(memcg) && !mem_high) { WRITE_ONCE(memcg->high_async_reclaim, true); - schedule_work(&memcg->high_work); +#ifdef CONFIG_MEMCG_SWAP_QOS + if (static_branch_likely(&memcg_swap_qos_key)) + schedule_work_on(smp_processor_id(), + &memcg->high_work); + else +#endif + schedule_work(&memcg->high_work); break; } #endif -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS] can: sja1000: fix use after free in ems_pcmcia_add_card()
by Chen Zhongjin 19 Jun '24

19 Jun '24

From: Dan Carpenter <dan.carpenter(a)oracle.com> stable inclusion from stable-v4.19.221 commit ccf070183e4655824936c0f96c4a2bcca93419aa category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9S254 CVE: CVE-2021-47521 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- commit 3ec6ca6b1a8e64389f0212b5a1b0f6fed1909e45 upstream. If the last channel is not available then "dev" is freed. Fortunately, we can just use "pdev->irq" instead. Also we should check if at least one channel was set up. Fixes: fd734c6f25ae ("can/sja1000: add driver for EMS PCMCIA card") Link: https://lore.kernel.org/all/20211124145041.GB13656@kili Cc: stable(a)vger.kernel.org Signed-off-by: Dan Carpenter <dan.carpenter(a)oracle.com> Acked-by: Oliver Hartkopp <socketcan(a)hartkopp.net> Tested-by: Oliver Hartkopp <socketcan(a)hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Chen Zhongjin <chenzhongjin(a)huawei.com> --- drivers/net/can/sja1000/ems_pcmcia.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/net/can/sja1000/ems_pcmcia.c b/drivers/net/can/sja1000/ems_pcmcia.c index 381de998d2f1..fef5c59c0f4c 100644 --- a/drivers/net/can/sja1000/ems_pcmcia.c +++ b/drivers/net/can/sja1000/ems_pcmcia.c @@ -243,7 +243,12 @@ static int ems_pcmcia_add_card(struct pcmcia_device *pdev, unsigned long base) free_sja1000dev(dev); } - err = request_irq(dev->irq, &ems_pcmcia_interrupt, IRQF_SHARED, + if (!card->channels) { + err = -ENODEV; + goto failure_cleanup; + } + + err = request_irq(pdev->irq, &ems_pcmcia_interrupt, IRQF_SHARED, DRV_NAME, card); if (!err) return 0; -- 2.25.1

2 1

[PATCH OLK-5.10 v2] sched: smart_grid: fix potential NULL pointer dereference
by Yipeng Zou 19 Jun '24

19 Jun '24

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IA6J1H CVE: NA ---------------------------------------- There is a low probability that kernel panic will occur when we test with smart_grid. The log show below: [65160.746953] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [65160.746990] Unable to handle kernel NULL pointer dereference at virtual address 000000000000000 [65160.756974] Mem abort info: [65160.766849] Mem abort info: [65160.770660] ESR = 0x96000004 [65160.770663] EC = 0x25: DABT (current EL), IL = 32 bits [65160.774374] ESR = 0x96000004 [65160.774377] EC = 0x25: DABT (current EL), IL = 32 bits [65160.778428] SET = 0, FnV = 0 [65160.778430] EA = 0, S1PTW = 0 [65160.784728] SET = 0, FnV = 0 [65160.784731] EA = 0, S1PTW = 0 [65160.786018] Detected VIPT I-cache on CPU104 [65160.786070] GICv3: CPU104: found redistributor 3a0000 region 104: 0x00002000aa300000 [65160.786240] CPU104: Booted secondary processor 0x00003a0000 [0x481fd010] [65160.788696] Data abort info: [65160.788699] ISV = 0, ISS = 0x00000004 [65160.794911] Data abort info: [65160.794913] ISV = 0, ISS = 0x00000004 [65160.798965] CM = 0, WnR = 0 [65160.798967] user pgtable: 4k pages, 48-bit VAs, pgdp=00000030059b2000 [65160.803102] CM = 0, WnR = 0 [65160.803104] user pgtable: 4k pages, 48-bit VAs, pgdp=00000020ab07c000 [65160.807066] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 [65160.811113] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 [65160.816199] Internal error: Oops: 0000000096000004 [#1] SMP [65160.832723] Modules linked in: [65161.006841] CPU: 39 PID: 195931 Comm: (hrottler) Kdump: loaded Nottainted 5.10.0 #51 [65161.016660] Hardware name: Huawei TaiShan 200 (Model 5280)/BC82AMDD, BIOS 1.08 12/14/2019 [65161.026146] pstate: 80400089 (Nzcv daIf +PAN -UAO -TCO BTYPE=--) [65161.033199] pc : set_task_select_cpus+0x8c/0x3d0 [65161.038865] lr : select_task_rq_fair+0x1c8/0x5cc [65161.044528] sp : ffff800172673ba0 [65161.048888] x29: ffff800172673ba0 x28: ffff00303f22cd00 [65161.055237] x27: 0000000000000027 x26: 0000000000000027 [65161.061583] x25: 0000aaaafd32b180 x24: 0000000000000002 [65161.067925] x23: ffff00303f22cd00 x22: 0000000000000000 [65161.074264] x21: ffff800172673cc4 x20: ffff00303f22cd00 [65161.080603] x19: ffff00303f22d7f4 x18: 0000000000000000 [65161.086938] x17: 0000000000000000 x16: 0000000000000000 [65161.093266] x15: 0000aaaafd365130 x14: 0000000000000000 [65161.099584] x13: 0000000000000000 x12: 0000000000000000 [65161.105892] x11: 0000000000000000 x10: 0000000000000000 [65161.112194] x9 : ffff800010129e80 x8 : 0000000000000000 [65161.118489] x7 : ffff00303f22cd00 x6 : 0000000000000001 [65161.124776] x5 : 0000000000000000 x4 : ffff8000118f5008 [65161.131058] x3 : 0000000000000000 x2 : 0000000000000002 [65161.137331] x1 : ffff800172673cc4 x0 : 0000000000000000 [65161.143598] Call trace: [65161.147005] set_task_select_cpus+0x8c/0x3d0 [65161.152225] select_task_rq_fair+0x1c8/0x5cc [65161.157439] sched_exec+0x94/0x1bc [65161.161782] bprm_execve.part.0+0x60/0x164 [65161.166813] bprm_execve+0x78/0xc0 [65161.171143] do_execveat_common+0x1c4/0x250 [65161.176244] __arm64_sys_execve+0x48/0x70 [65161.181167] invoke_syscall+0x50/0x130 [65161.185824] el0_svc_common.constprop.0+0x158/0x180 [65161.191601] do_el0_svc+0x34/0xe0 [65161.195816] el0_svc+0x20/0x30 [65161.199773] el0_sync_handler+0xb8/0xc0 [65161.204502] el0_sync+0x1e8/0x200 [65161.208712] Code: d50323bf d65f03c0 f941ac00 f941c400 (f9400400) It's panic on task_group(current)->auto_affinity dereference. In the scenario like: CPU0 CPU1 rmdir cgroup free auto_affinity try to wake up select_task_rq_fair auto_affinity(NULL) dereference panic Because there is no protect when remove task from one cgroup with task wakeup, so we need to check auto_affinity is NULL in task_prefer_cpus. Fixes: 90ef693102cc ("sched: Fix possible deadlock in tg_set_dynamic_affinity_mode") Signed-off-by: Yipeng Zou <zouyipeng(a)huawei.com> --- kernel/sched/fair.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 273f6844bc2a..a6145cc1426d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5988,7 +5988,8 @@ static void smart_grid_usage_dec(void) static inline struct cpumask *task_prefer_cpus(struct task_struct *p) { - if (!smart_grid_used()) + if (!smart_grid_used() || + !task_group(p)->auto_affinity) return p->prefer_cpus; if (task_group(p)->auto_affinity->mode == 0) -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS v2] sched: smart_grid: fix potential NULL pointer dereference
by Yipeng Zou 19 Jun '24

19 Jun '24

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IA6J1H ---------------------------------------- There is a low probability that kernel panic will occur when we test with smart_grid. The log show below: [65160.746953] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [65160.746990] Unable to handle kernel NULL pointer dereference at virtual address 000000000000000 [65160.756974] Mem abort info: [65160.766849] Mem abort info: [65160.770660] ESR = 0x96000004 [65160.770663] EC = 0x25: DABT (current EL), IL = 32 bits [65160.774374] ESR = 0x96000004 [65160.774377] EC = 0x25: DABT (current EL), IL = 32 bits [65160.778428] SET = 0, FnV = 0 [65160.778430] EA = 0, S1PTW = 0 [65160.784728] SET = 0, FnV = 0 [65160.784731] EA = 0, S1PTW = 0 [65160.786018] Detected VIPT I-cache on CPU104 [65160.786070] GICv3: CPU104: found redistributor 3a0000 region 104: 0x00002000aa300000 [65160.786240] CPU104: Booted secondary processor 0x00003a0000 [0x481fd010] [65160.788696] Data abort info: [65160.788699] ISV = 0, ISS = 0x00000004 [65160.794911] Data abort info: [65160.794913] ISV = 0, ISS = 0x00000004 [65160.798965] CM = 0, WnR = 0 [65160.798967] user pgtable: 4k pages, 48-bit VAs, pgdp=00000030059b2000 [65160.803102] CM = 0, WnR = 0 [65160.803104] user pgtable: 4k pages, 48-bit VAs, pgdp=00000020ab07c000 [65160.807066] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 [65160.811113] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 [65160.816199] Internal error: Oops: 0000000096000004 [#1] SMP [65160.832723] Modules linked in: [65161.006841] CPU: 39 PID: 195931 Comm: (hrottler) Kdump: loaded Nottainted 5.10.0 #51 [65161.016660] Hardware name: Huawei TaiShan 200 (Model 5280)/BC82AMDD, BIOS 1.08 12/14/2019 [65161.026146] pstate: 80400089 (Nzcv daIf +PAN -UAO -TCO BTYPE=--) [65161.033199] pc : set_task_select_cpus+0x8c/0x3d0 [65161.038865] lr : select_task_rq_fair+0x1c8/0x5cc [65161.044528] sp : ffff800172673ba0 [65161.048888] x29: ffff800172673ba0 x28: ffff00303f22cd00 [65161.055237] x27: 0000000000000027 x26: 0000000000000027 [65161.061583] x25: 0000aaaafd32b180 x24: 0000000000000002 [65161.067925] x23: ffff00303f22cd00 x22: 0000000000000000 [65161.074264] x21: ffff800172673cc4 x20: ffff00303f22cd00 [65161.080603] x19: ffff00303f22d7f4 x18: 0000000000000000 [65161.086938] x17: 0000000000000000 x16: 0000000000000000 [65161.093266] x15: 0000aaaafd365130 x14: 0000000000000000 [65161.099584] x13: 0000000000000000 x12: 0000000000000000 [65161.105892] x11: 0000000000000000 x10: 0000000000000000 [65161.112194] x9 : ffff800010129e80 x8 : 0000000000000000 [65161.118489] x7 : ffff00303f22cd00 x6 : 0000000000000001 [65161.124776] x5 : 0000000000000000 x4 : ffff8000118f5008 [65161.131058] x3 : 0000000000000000 x2 : 0000000000000002 [65161.137331] x1 : ffff800172673cc4 x0 : 0000000000000000 [65161.143598] Call trace: [65161.147005] set_task_select_cpus+0x8c/0x3d0 [65161.152225] select_task_rq_fair+0x1c8/0x5cc [65161.157439] sched_exec+0x94/0x1bc [65161.161782] bprm_execve.part.0+0x60/0x164 [65161.166813] bprm_execve+0x78/0xc0 [65161.171143] do_execveat_common+0x1c4/0x250 [65161.176244] __arm64_sys_execve+0x48/0x70 [65161.181167] invoke_syscall+0x50/0x130 [65161.185824] el0_svc_common.constprop.0+0x158/0x180 [65161.191601] do_el0_svc+0x34/0xe0 [65161.195816] el0_svc+0x20/0x30 [65161.199773] el0_sync_handler+0xb8/0xc0 [65161.204502] el0_sync+0x1e8/0x200 [65161.208712] Code: d50323bf d65f03c0 f941ac00 f941c400 (f9400400) It's panic on task_group(current)->auto_affinity dereference. In the scenario like: CPU0 CPU1 rmdir cgroup free auto_affinity try to wake up select_task_rq_fair auto_affinity(NULL) dereference panic Because there is no protect when remove task from one cgroup with task wakeup, so we need to check auto_affinity is NULL in task_prefer_cpus. Fixes: 21e5d85e205f ("sched: Fix possible deadlock in tg_set_dynamic_affinity_mode") Signed-off-by: Yipeng Zou <zouyipeng(a)huawei.com> --- kernel/sched/fair.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index aee13d30a7de..63f4344ac344 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5329,7 +5329,8 @@ static inline struct cpumask *task_prefer_cpus(struct task_struct *p) { struct affinity_domain *ad; - if (!smart_grid_used()) + if (!smart_grid_used() || + !task_group(p)->auto_affinity) return p->prefer_cpus; if (task_group(p)->auto_affinity->mode == 0) -- 2.34.1

2 1

[PATCH OLK-6.6 v2] sched: smart_grid: fix potential NULL pointer dereference
by Yipeng Zou 19 Jun '24

19 Jun '24

hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/IA6J1H CVE: NA ---------------------------------------- There is a low probability that kernel panic will occur when we test with smart_grid. The log show below: [65160.746953] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [65160.746990] Unable to handle kernel NULL pointer dereference at virtual address 000000000000000 [65160.756974] Mem abort info: [65160.766849] Mem abort info: [65160.770660] ESR = 0x96000004 [65160.770663] EC = 0x25: DABT (current EL), IL = 32 bits [65160.774374] ESR = 0x96000004 [65160.774377] EC = 0x25: DABT (current EL), IL = 32 bits [65160.778428] SET = 0, FnV = 0 [65160.778430] EA = 0, S1PTW = 0 [65160.784728] SET = 0, FnV = 0 [65160.784731] EA = 0, S1PTW = 0 [65160.786018] Detected VIPT I-cache on CPU104 [65160.786070] GICv3: CPU104: found redistributor 3a0000 region 104: 0x00002000aa300000 [65160.786240] CPU104: Booted secondary processor 0x00003a0000 [0x481fd010] [65160.788696] Data abort info: [65160.788699] ISV = 0, ISS = 0x00000004 [65160.794911] Data abort info: [65160.794913] ISV = 0, ISS = 0x00000004 [65160.798965] CM = 0, WnR = 0 [65160.798967] user pgtable: 4k pages, 48-bit VAs, pgdp=00000030059b2000 [65160.803102] CM = 0, WnR = 0 [65160.803104] user pgtable: 4k pages, 48-bit VAs, pgdp=00000020ab07c000 [65160.807066] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 [65160.811113] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 [65160.816199] Internal error: Oops: 0000000096000004 [#1] SMP [65160.832723] Modules linked in: [65161.006841] CPU: 39 PID: 195931 Comm: (hrottler) Kdump: loaded Nottainted 5.10.0 #51 [65161.016660] Hardware name: Huawei TaiShan 200 (Model 5280)/BC82AMDD, BIOS 1.08 12/14/2019 [65161.026146] pstate: 80400089 (Nzcv daIf +PAN -UAO -TCO BTYPE=--) [65161.033199] pc : set_task_select_cpus+0x8c/0x3d0 [65161.038865] lr : select_task_rq_fair+0x1c8/0x5cc [65161.044528] sp : ffff800172673ba0 [65161.048888] x29: ffff800172673ba0 x28: ffff00303f22cd00 [65161.055237] x27: 0000000000000027 x26: 0000000000000027 [65161.061583] x25: 0000aaaafd32b180 x24: 0000000000000002 [65161.067925] x23: ffff00303f22cd00 x22: 0000000000000000 [65161.074264] x21: ffff800172673cc4 x20: ffff00303f22cd00 [65161.080603] x19: ffff00303f22d7f4 x18: 0000000000000000 [65161.086938] x17: 0000000000000000 x16: 0000000000000000 [65161.093266] x15: 0000aaaafd365130 x14: 0000000000000000 [65161.099584] x13: 0000000000000000 x12: 0000000000000000 [65161.105892] x11: 0000000000000000 x10: 0000000000000000 [65161.112194] x9 : ffff800010129e80 x8 : 0000000000000000 [65161.118489] x7 : ffff00303f22cd00 x6 : 0000000000000001 [65161.124776] x5 : 0000000000000000 x4 : ffff8000118f5008 [65161.131058] x3 : 0000000000000000 x2 : 0000000000000002 [65161.137331] x1 : ffff800172673cc4 x0 : 0000000000000000 [65161.143598] Call trace: [65161.147005] set_task_select_cpus+0x8c/0x3d0 [65161.152225] select_task_rq_fair+0x1c8/0x5cc [65161.157439] sched_exec+0x94/0x1bc [65161.161782] bprm_execve.part.0+0x60/0x164 [65161.166813] bprm_execve+0x78/0xc0 [65161.171143] do_execveat_common+0x1c4/0x250 [65161.176244] __arm64_sys_execve+0x48/0x70 [65161.181167] invoke_syscall+0x50/0x130 [65161.185824] el0_svc_common.constprop.0+0x158/0x180 [65161.191601] do_el0_svc+0x34/0xe0 [65161.195816] el0_svc+0x20/0x30 [65161.199773] el0_sync_handler+0xb8/0xc0 [65161.204502] el0_sync+0x1e8/0x200 [65161.208712] Code: d50323bf d65f03c0 f941ac00 f941c400 (f9400400) It's panic on task_group(current)->auto_affinity dereference. In the scenario like: CPU0 CPU1 rmdir cgroup free auto_affinity try to wake up select_task_rq_fair auto_affinity(NULL) dereference panic Because there is no protect when remove task from one cgroup with task wakeup, so we need to check auto_affinity is NULL in task_prefer_cpus. Fixes: 6eb07f9925a9 ("sched: Introduce smart grid scheduling strategy for cfs") Signed-off-by: Yipeng Zou <zouyipeng(a)huawei.com> --- kernel/sched/fair.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 84af50f4285f..d2efd40fb784 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6833,7 +6833,8 @@ static void smart_grid_usage_dec(void) static inline struct cpumask *task_prefer_cpus(struct task_struct *p) { - if (!smart_grid_used()) + if (!smart_grid_used() || + !task_group(p)->auto_affinity) return p->prefer_cpus; if (task_group(p)->auto_affinity->mode == 0) -- 2.34.1

1 0

[PATCH openEuler-1.0-LTS 0/2] spi: Fix deadlock when adding SPI controllers on SPI buses
by Zeng Heng 19 Jun '24

19 Jun '24

Mark Brown (1): spi: Fix deadlock when adding SPI controllers on SPI buses Zeng Heng (1): spi: fix kabi breakage in struct spi_controller drivers/spi/spi.c | 15 +++++---------- include/linux/device.h | 8 ++++++++ 2 files changed, 13 insertions(+), 10 deletions(-) -- 2.25.1

2 3

[openeuler:openEuler-1.0-LTS 15323/22974] drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c:181:6: sparse: sparse: symbol 'hclge_ext_init' was not declared. Should it be static?
by kernel test robot 19 Jun '24

19 Jun '24

tree: https://gitee.com/openeuler/kernel.git openEuler-1.0-LTS head: 9e57bb4473766dca5e26f8b78853f38dd62d1aa3 commit: c3acbb84d1aa72a112cdfb9479ae744b21a92751 [15323/22974] net: hns3: adds support for setting pf max tx rate via sysfs config: arm64-randconfig-r111-20240615 (https://download.01.org/0day-ci/archive/20240619/202406191251.tnS3pNVS-lkp@…) compiler: aarch64-linux-gcc (GCC) 13.2.0 reproduce: (https://download.01.org/0day-ci/archive/20240619/202406191251.tnS3pNVS-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202406191251.tnS3pNVS-lkp@intel.com/ sparse warnings: (new ones prefixed by >>) drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c:140:44: sparse: sparse: mixing different enum types: drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c:140:44: sparse: unsigned int enum hnae3_reset_type drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c:140:44: sparse: unsigned int enum hnae3_event_type_custom drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c:97:6: sparse: sparse: symbol 'hclge_reset_event_it' was not declared. Should it be static? drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c:148:6: sparse: sparse: symbol 'hclge_reset_done_it' was not declared. Should it be static? >> drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c:181:6: sparse: sparse: symbol 'hclge_ext_init' was not declared. Should it be static? >> drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c:186:6: sparse: sparse: symbol 'hclge_ext_uninit' was not declared. Should it be static? >> drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c:195:6: sparse: sparse: symbol 'hclge_ext_reset_done' was not declared. Should it be static? drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c:204:5: sparse: sparse: symbol 'hclge_init_it' was not declared. Should it be static? drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c:97:6: warning: no previous prototype for 'hclge_reset_event_it' [-Wmissing-prototypes] 97 | void hclge_reset_event_it(struct pci_dev *pdev, struct hnae3_handle *handle) | ^~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c: In function 'hclge_reset_event_it': drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c:140:44: warning: implicit conversion from 'enum hnae3_reset_type' to 'enum hnae3_event_type_custom' [-Wenum-conversion] 140 | nic_call_event(netdev, hdev->reset_level); | ~~~~^~~~~~~~~~~~~ drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c: At top level: drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c:148:6: warning: no previous prototype for 'hclge_reset_done_it' [-Wmissing-prototypes] 148 | bool hclge_reset_done_it(struct hnae3_handle *handle, bool done) | ^~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c:181:6: warning: no previous prototype for 'hclge_ext_init' [-Wmissing-prototypes] 181 | void hclge_ext_init(struct hnae3_handle *handle) | ^~~~~~~~~~~~~~ drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c:186:6: warning: no previous prototype for 'hclge_ext_uninit' [-Wmissing-prototypes] 186 | void hclge_ext_uninit(struct hnae3_handle *handle) | ^~~~~~~~~~~~~~~~ drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c:195:6: warning: no previous prototype for 'hclge_ext_reset_done' [-Wmissing-prototypes] 195 | void hclge_ext_reset_done(struct hnae3_handle *handle) | ^~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c:204:5: warning: no previous prototype for 'hclge_init_it' [-Wmissing-prototypes] 204 | int hclge_init_it(void) | ^~~~~~~~~~~~~ vim +/hclge_ext_init +181 drivers/net/ethernet/hisilicon/hns3/hns3_extension/hns3pf/hclge_main_it.c 179 180 #ifdef CONFIG_HNS3_TEST > 181 void hclge_ext_init(struct hnae3_handle *handle) 182 { 183 hclge_sysfs_init(handle); 184 } 185 > 186 void hclge_ext_uninit(struct hnae3_handle *handle) 187 { 188 struct hclge_vport *vport = hclge_get_vport(handle); 189 struct hclge_dev *hdev = vport->back; 190 191 hclge_reset_pf_rate(hdev); 192 hclge_sysfs_uninit(handle); 193 } 194 > 195 void hclge_ext_reset_done(struct hnae3_handle *handle) 196 { 197 struct hclge_vport *vport = hclge_get_vport(handle); 198 struct hclge_dev *hdev = vport->back; 199 200 hclge_resume_pf_rate(hdev); 201 } 202 #endif 203 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

1 0