mailweb.openeuler.org
Manage this list

Keyboard Shortcuts

Thread View

  • j: Next unread message
  • k: Previous unread message
  • j a: Jump to all threads
  • j l: Jump to MailingList overview

Kernel

Threads by month
  • ----- 2025 -----
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2024 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2023 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2022 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2021 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2020 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2019 -----
  • December
kernel@openeuler.org

  • 70 participants
  • 19491 discussions
[openeuler:openEuler-1.0-LTS 1740/1740] mm/mem_reliable.c:126:5: sparse: sparse: symbol 'reliable_limit_handler' was not declared. Should it be static?
by kernel test robot 31 Jul '25

31 Jul '25
tree: https://gitee.com/openeuler/kernel.git openEuler-1.0-LTS head: 859de5033e15abefcf19935429e6478be97d889a commit: 1845e7add95773a24019fb72bbea24a0a568663b [1740/1740] mm: Add reliable memory use limit for user tasks config: arm64-randconfig-r121-20250729 (https://download.01.org/0day-ci/archive/20250731/202507310153.fW2hs6vO-lkp@…) compiler: aarch64-linux-gcc (GCC) 15.1.0 reproduce: (https://download.01.org/0day-ci/archive/20250731/202507310153.fW2hs6vO-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202507310153.fW2hs6vO-lkp@intel.com/ sparse warnings: (new ones prefixed by >>) >> mm/mem_reliable.c:126:5: sparse: sparse: symbol 'reliable_limit_handler' was not declared. Should it be static? mm/mem_reliable.c:126:5: warning: no previous prototype for 'reliable_limit_handler' [-Wmissing-prototypes] 126 | int reliable_limit_handler(struct ctl_table *table, int write, | ^~~~~~~~~~~~~~~~~~~~~~ vim +/reliable_limit_handler +126 mm/mem_reliable.c 124 125 #ifdef CONFIG_SYSCTL > 126 int reliable_limit_handler(struct ctl_table *table, int write, 127 void __user *buffer, size_t *length, loff_t *ppos) 128 { 129 unsigned long old = task_reliable_limit; 130 int ret; 131 132 ret = proc_doulongvec_minmax(table, write, buffer, length, ppos); 133 if (ret == 0 && write) { 134 if (task_reliable_limit > total_reliable_mem_sz()) { 135 task_reliable_limit = old; 136 return -EINVAL; 137 } 138 } 139 140 return ret; 141 } 142 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
1 0
0 0
[openeuler:openEuler-1.0-LTS 1740/1740] include/linux/mem_reliable.h:41:15: sparse: sparse: restricted gfp_t degrades to integer
by kernel test robot 30 Jul '25

30 Jul '25
tree: https://gitee.com/openeuler/kernel.git openEuler-1.0-LTS head: 859de5033e15abefcf19935429e6478be97d889a commit: 33d1f46ad98ea3a13752a6360d97732ab4e119b9 [1740/1740] mm: Introduce memory reliable config: arm64-randconfig-r121-20250729 (https://download.01.org/0day-ci/archive/20250730/202507302110.c40p4uEx-lkp@…) compiler: aarch64-linux-gcc (GCC) 15.1.0 reproduce: (https://download.01.org/0day-ci/archive/20250730/202507302110.c40p4uEx-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202507302110.c40p4uEx-lkp@intel.com/ sparse warnings: (new ones prefixed by >>) mm/page_alloc.c:140:1: sparse: sparse: symbol 'pcpu_drain_mutex' was not declared. Should it be static? mm/page_alloc.c: note: in included file (through include/linux/mm.h): >> include/linux/mem_reliable.h:41:15: sparse: sparse: restricted gfp_t degrades to integer mm/page_alloc.c:4572:13: sparse: sparse: restricted gfp_t degrades to integer mm/page_alloc.c:4573:27: sparse: sparse: invalid assignment: |= mm/page_alloc.c:4573:27: sparse: left side has type restricted gfp_t mm/page_alloc.c:4573:27: sparse: right side has type unsigned int mm/page_alloc.c: note: in included file (through include/linux/mm.h): include/linux/gfp.h:457:34: sparse: sparse: restricted gfp_t degrades to integer include/linux/gfp.h:324:27: sparse: sparse: restricted gfp_t degrades to integer include/linux/gfp.h:324:27: sparse: sparse: restricted gfp_t degrades to integer include/linux/gfp.h:457:34: sparse: sparse: restricted gfp_t degrades to integer include/linux/gfp.h:457:34: sparse: sparse: restricted gfp_t degrades to integer mm/page_alloc.c: In function 'mem_init_print_info': mm/page_alloc.c:7373:27: warning: comparison between two arrays [-Warray-compare] 7373 | if (start <= pos && pos < end && size > adj) 20- | ^~ mm/page_alloc.c:7377:9: note: in expansion of macro 'adj_init_size' 7377 | adj_init_size(__init_begin, __init_end, init_data_size, | ^~~~~~~~~~~~~ mm/page_alloc.c:7373:27: note: use '&__init_begin[0] <= &_sinittext[0]' to compare the addresses 7373 | if (start <= pos && pos < end && size > adj) 26- | ^~ mm/page_alloc.c:7377:9: note: in expansion of macro 'adj_init_size' 7377 | adj_init_size(__init_begin, __init_end, init_data_size, | ^~~~~~~~~~~~~ mm/page_alloc.c:7373:41: warning: comparison between two arrays [-Warray-compare] 7373 | if (start <= pos && pos < end && size > adj) 32- | ^ mm/page_alloc.c:7377:9: note: in expansion of macro 'adj_init_size' 7377 | adj_init_size(__init_begin, __init_end, init_data_size, | ^~~~~~~~~~~~~ mm/page_alloc.c:7373:41: note: use '&_sinittext[0] < &__init_end[0]' to compare the addresses 7373 | if (start <= pos && pos < end && size > adj) 38- | ^ mm/page_alloc.c:7377:9: note: in expansion of macro 'adj_init_size' 7377 | adj_init_size(__init_begin, __init_end, init_data_size, | ^~~~~~~~~~~~~ mm/page_alloc.c:7373:27: warning: comparison between two arrays [-Warray-compare] 7373 | if (start <= pos && pos < end && size > adj) 44- | ^~ mm/page_alloc.c:7379:9: note: in expansion of macro 'adj_init_size' 7379 | adj_init_size(_stext, _etext, codesize, _sinittext, init_code_size); | ^~~~~~~~~~~~~ mm/page_alloc.c:7373:27: note: use '&_stext[0] <= &_sinittext[0]' to compare the addresses 7373 | if (start <= pos && pos < end && size > adj) 50- | ^~ mm/page_alloc.c:7379:9: note: in expansion of macro 'adj_init_size' 7379 | adj_init_size(_stext, _etext, codesize, _sinittext, init_code_size); | ^~~~~~~~~~~~~ mm/page_alloc.c:7373:41: warning: comparison between two arrays [-Warray-compare] 7373 | if (start <= pos && pos < end && size > adj) 56- | ^ mm/page_alloc.c:7379:9: note: in expansion of macro 'adj_init_size' 7379 | adj_init_size(_stext, _etext, codesize, _sinittext, init_code_size); | ^~~~~~~~~~~~~ mm/page_alloc.c:7373:41: note: use '&_sinittext[0] < &_etext[0]' to compare the addresses 7373 | if (start <= pos && pos < end && size > adj) 62- | ^ mm/page_alloc.c:7379:9: note: in expansion of macro 'adj_init_size' 7379 | adj_init_size(_stext, _etext, codesize, _sinittext, init_code_size); | ^~~~~~~~~~~~~ mm/page_alloc.c:7373:27: warning: comparison between two arrays [-Warray-compare] 7373 | if (start <= pos && pos < end && size > adj) 68- | ^~ mm/page_alloc.c:7380:9: note: in expansion of macro 'adj_init_size' 7380 | adj_init_size(_sdata, _edata, datasize, __init_begin, init_data_size); | ^~~~~~~~~~~~~ mm/page_alloc.c:7373:27: note: use '&_sdata[0] <= &__init_begin[0]' to compare the addresses 7373 | if (start <= pos && pos < end && size > adj) 74- | ^~ mm/page_alloc.c:7380:9: note: in expansion of macro 'adj_init_size' 7380 | adj_init_size(_sdata, _edata, datasize, __init_begin, init_data_size); | ^~~~~~~~~~~~~ mm/page_alloc.c:7373:41: warning: comparison between two arrays [-Warray-compare] 7373 | if (start <= pos && pos < end && size > adj) 80- | ^ mm/page_alloc.c:7380:9: note: in expansion of macro 'adj_init_size' 7380 | adj_init_size(_sdata, _edata, datasize, __init_begin, init_data_size); | ^~~~~~~~~~~~~ mm/page_alloc.c:7373:41: note: use '&__init_begin[0] < &_edata[0]' to compare the addresses 7373 | if (start <= pos && pos < end && size > adj) 86- | ^ mm/page_alloc.c:7380:9: note: in expansion of macro 'adj_init_size' 7380 | adj_init_size(_sdata, _edata, datasize, __init_begin, init_data_size); | ^~~~~~~~~~~~~ mm/page_alloc.c:7373:27: warning: comparison between two arrays [-Warray-compare] 7373 | if (start <= pos && pos < end && size > adj) 92- | ^~ mm/page_alloc.c:7381:9: note: in expansion of macro 'adj_init_size' 7381 | adj_init_size(_stext, _etext, codesize, __start_rodata, rosize); | ^~~~~~~~~~~~~ mm/page_alloc.c:7373:27: note: use '&_stext[0] <= &__start_rodata[0]' to compare the addresses 7373 | if (start <= pos && pos < end && size > adj) 98- | ^~ mm/page_alloc.c:7381:9: note: in expansion of macro 'adj_init_size' 7381 | adj_init_size(_stext, _etext, codesize, __start_rodata, rosize); | ^~~~~~~~~~~~~ mm/page_alloc.c:7373:41: warning: comparison between two arrays [-Warray-compare] 7373 | if (start <= pos && pos < end && size > adj) 104- | ^ mm/page_alloc.c:7381:9: note: in expansion of macro 'adj_init_size' 7381 | adj_init_size(_stext, _etext, codesize, __start_rodata, rosize); vim +41 include/linux/mem_reliable.h 31 32 static inline bool skip_none_movable_zone(gfp_t gfp, struct zoneref *z) 33 { 34 if (!mem_reliable_is_enabled()) 35 return false; 36 37 if (!current->mm || (current->flags & PF_KTHREAD)) 38 return false; 39 40 /* user tasks can only alloc memory from non-mirrored region */ > 41 if (!(gfp & ___GFP_RELIABILITY) && (gfp & __GFP_HIGHMEM) && 42 (gfp & __GFP_MOVABLE)) { 43 if (zonelist_zone_idx(z) < ZONE_MOVABLE) 44 return true; 45 } 46 47 return false; 48 } 49 #else 50 #define reliable_enabled 0 51 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
1 0
0 0
[PATCH OLK-6.6 V2] sched: Support NUMA parallel scheduling for multiple processes
by Cheng Yu 30 Jul '25

30 Jul '25
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/ICBBNL -------------------------------- For architectures with multiple NUMA node levels and large distances between nodes, a better approach is to support processes running in parallel on each NUMA node. The usage is restricted to the following scenarios: 1. No CPU binding for user-space processes; 2. It is applicable to distributed applications, such as business architectures with one master and multiple slaves running in parallel; 3. The existing "qos dynamic affinity" and "qos smart grid" features must not be used simultaneously. Signed-off-by: Cheng Yu <serein.chengyu(a)huawei.com> --- arch/arm64/Kconfig | 1 + arch/arm64/configs/openeuler_defconfig | 1 + arch/arm64/include/asm/prefer_numa.h | 13 +++++ arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/prefer_numa.c | 72 ++++++++++++++++++++++++++ fs/proc/array.c | 3 -- include/linux/perf_event.h | 2 + include/linux/sched.h | 6 +++ init/Kconfig | 22 ++++++++ kernel/cgroup/cpuset.c | 6 ++- kernel/events/core.c | 13 +++++ kernel/fork.c | 11 ++-- kernel/sched/core.c | 3 -- kernel/sched/debug.c | 43 ++++++++++++++- kernel/sched/fair.c | 39 +++++++++++--- kernel/sched/features.h | 4 ++ 16 files changed, 216 insertions(+), 24 deletions(-) create mode 100644 arch/arm64/include/asm/prefer_numa.h create mode 100644 arch/arm64/kernel/prefer_numa.c diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 5422d1502fd6..b1f550c8c82a 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -105,6 +105,7 @@ config ARM64 select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 select ARCH_SUPPORTS_NUMA_BALANCING + select ARCH_SUPPORTS_SCHED_PARAL select ARCH_SUPPORTS_PAGE_TABLE_CHECK select ARCH_SUPPORTS_PER_VMA_LOCK select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 3cfff0701479..3d352fb1ae57 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -209,6 +209,7 @@ CONFIG_USER_NS=y CONFIG_PID_NS=y CONFIG_NET_NS=y CONFIG_SCHED_STEAL=y +CONFIG_SCHED_PARAL=y CONFIG_CHECKPOINT_RESTORE=y CONFIG_SCHED_AUTOGROUP=y CONFIG_RELAY=y diff --git a/arch/arm64/include/asm/prefer_numa.h b/arch/arm64/include/asm/prefer_numa.h new file mode 100644 index 000000000000..6c8e2b2142b9 --- /dev/null +++ b/arch/arm64/include/asm/prefer_numa.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef __ASM_PREFER_NUMA_H +#define __ASM_PREFER_NUMA_H + +#include <linux/sched.h> + +#define PROBE_NUMA_PMU_NAME "hisi_sccl3_hha0" +#define PROBE_NUMA_PMU_EVENT 0x02 + +void set_task_paral_node(struct task_struct *p); +int probe_pmu_numa_event(void); + +#endif /* __ASM_PREFER_NUMA_H */ diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index 3d404a2cc961..b936be9d8baa 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -84,6 +84,7 @@ obj-$(CONFIG_IPI_AS_NMI) += ipi_nmi.o obj-$(CONFIG_HISI_VIRTCCA_GUEST) += virtcca_cvm_guest.o virtcca_cvm_tsi.o obj-$(CONFIG_HISI_VIRTCCA_HOST) += virtcca_cvm_host.o CFLAGS_patch-scs.o += -mbranch-protection=none +obj-$(CONFIG_SCHED_PARAL) += prefer_numa.o # Force dependency (vdso*-wrap.S includes vdso.so through incbin) $(obj)/vdso-wrap.o: $(obj)/vdso/vdso.so diff --git a/arch/arm64/kernel/prefer_numa.c b/arch/arm64/kernel/prefer_numa.c new file mode 100644 index 000000000000..394dd4098c8f --- /dev/null +++ b/arch/arm64/kernel/prefer_numa.c @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * choose a prefer numa node + * + * Copyright (C) 2025 Huawei Limited. + */ +#include <linux/perf_event.h> +#include <asm/prefer_numa.h> + +static atomic_t paral_nid_last = ATOMIC_INIT(-1); + +int probe_pmu_numa_event(void) +{ + int cpu = -1; + struct perf_event *event; + struct perf_event_attr attr = {}; + int type = perf_pmu_type_of_name(PROBE_NUMA_PMU_NAME); + + if (type == -1) + return -EINVAL; + + preempt_disable(); + cpu = smp_processor_id(); + preempt_enable(); + + attr.type = type; + attr.config = PROBE_NUMA_PMU_EVENT; + attr.size = sizeof(struct perf_event_attr); + attr.pinned = 1; + attr.disabled = 1; + attr.sample_period = 0; + + event = perf_event_create_kernel_counter(&attr, cpu, NULL, NULL, NULL); + if (IS_ERR(event)) + return PTR_ERR(event); + + perf_event_release_kernel(event); + + return 0; +} + +static inline unsigned int update_sched_paral_nid(void) +{ + return (unsigned int)atomic_inc_return(&paral_nid_last); +} + +void set_task_paral_node(struct task_struct *p) +{ + int nid; + int i = 0; + const cpumask_t *cpus_mask; + + if (is_global_init(current)) + return; + + if (p->flags & PF_KTHREAD || p->tgid != p->pid) + return; + + while (i < nr_node_ids) { + nid = update_sched_paral_nid() % nr_node_ids; + cpus_mask = cpumask_of_node(nid); + + if (cpumask_empty(cpus_mask) || + !cpumask_subset(cpus_mask, p->cpus_ptr)) { + i++; + continue; + } + + cpumask_copy(p->prefer_cpus, cpus_mask); + break; + } +} diff --git a/fs/proc/array.c b/fs/proc/array.c index a933a878df3c..6a4b0a850dce 100644 --- a/fs/proc/array.c +++ b/fs/proc/array.c @@ -439,9 +439,6 @@ __weak void arch_proc_pid_thread_features(struct seq_file *m, #ifdef CONFIG_QOS_SCHED_DYNAMIC_AFFINITY static void task_cpus_preferred(struct seq_file *m, struct task_struct *task) { - if (!dynamic_affinity_enabled()) - return; - seq_printf(m, "Cpus_preferred:\t%*pb\n", cpumask_pr_args(task->prefer_cpus)); seq_printf(m, "Cpus_preferred_list:\t%*pbl\n", diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 826fb16906fe..14ed13f4b408 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1778,6 +1778,7 @@ extern void perf_event_task_tick(void); extern int perf_event_account_interrupt(struct perf_event *event); extern int perf_event_period(struct perf_event *event, u64 value); extern u64 perf_event_pause(struct perf_event *event, bool reset); +extern int perf_pmu_type_of_name(const char *name); #else /* !CONFIG_PERF_EVENTS: */ static inline void * perf_aux_output_begin(struct perf_output_handle *handle, @@ -1864,6 +1865,7 @@ static inline u64 perf_event_pause(struct perf_event *event, bool reset) { return 0; } +static inline int perf_pmu_type_of_name(const char *name) { return -1; } #endif #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL) diff --git a/include/linux/sched.h b/include/linux/sched.h index 3979c34e9b83..ee10780715f1 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2627,6 +2627,12 @@ static inline bool dynamic_affinity_enabled(void) { return static_branch_unlikely(&__dynamic_affinity_switch); } + +#ifdef CONFIG_SCHED_PARAL +bool sched_paral_used(void); +#else +static inline bool sched_paral_used(void) { return false; } +#endif #endif #ifdef CONFIG_QOS_SCHED_SMART_GRID diff --git a/init/Kconfig b/init/Kconfig index c8bd58347a87..925e8517a7e8 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1484,6 +1484,28 @@ config SCHED_STEAL If unsure, say N here. +# +# For architectures that want to enable the support for SCHED_PARAL +# +config ARCH_SUPPORTS_SCHED_PARAL + bool + +config SCHED_PARAL + bool "Parallelly schedule processes on different NUMA nodes" + depends on ARCH_SUPPORTS_SCHED_PARAL + depends on QOS_SCHED_DYNAMIC_AFFINITY + default n + help + By enabling this feature, processes can be scheduled in parallel + on various NUMA nodes to better utilize the cache in NUMA node. + The usage is restricted to the following scenarios: + 1. No CPU binding is performed for user-space processes; + 2. It is applicable to distributed applications, such as business + architectures with one master and multiple slaves running in + parallel; + 3. The existing "qos dynamic affinity" and "qos smart grid" + features must not be used simultaneously. + config CHECKPOINT_RESTORE bool "Checkpoint/restore support" depends on PROC_FS diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 417827f2c043..01a9b18d80ce 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -3488,7 +3488,8 @@ static void cpuset_attach_task(struct cpuset *cs, struct task_struct *task) WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpus_attach)); #ifdef CONFIG_QOS_SCHED_DYNAMIC_AFFINITY cpumask_copy(prefer_cpus_attach, cs->prefer_cpus); - set_prefer_cpus_ptr(task, prefer_cpus_attach); + if (!sched_paral_used() || !cpumask_empty(prefer_cpus_attach)) + set_prefer_cpus_ptr(task, prefer_cpus_attach); #endif cpuset_change_task_nodemask(task, &cpuset_attach_nodemask_to); @@ -4348,7 +4349,8 @@ static void cpuset_fork(struct task_struct *task) set_cpus_allowed_ptr(task, current->cpus_ptr); #ifdef CONFIG_QOS_SCHED_DYNAMIC_AFFINITY - set_prefer_cpus_ptr(task, current->prefer_cpus); + if (!sched_paral_used() || !cpumask_empty(cs->prefer_cpus)) + set_prefer_cpus_ptr(task, current->prefer_cpus); #endif task->mems_allowed = current->mems_allowed; return; diff --git a/kernel/events/core.c b/kernel/events/core.c index f042d6101932..99f46f6ea198 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -13855,6 +13855,19 @@ static int __init perf_event_sysfs_init(void) } device_initcall(perf_event_sysfs_init); +int perf_pmu_type_of_name(const char *name) +{ + unsigned int i; + struct pmu *pmu; + + idr_for_each_entry(&pmu_idr, pmu, i) { + if (!strcmp(pmu->name, name)) + return pmu->type; + } + + return -1; +} + #ifdef CONFIG_CGROUP_PERF static struct cgroup_subsys_state * perf_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) diff --git a/kernel/fork.c b/kernel/fork.c index 96c6a9e446ac..8b2ff47de685 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -631,8 +631,7 @@ void free_task(struct task_struct *tsk) free_kthread_struct(tsk); bpf_task_storage_free(tsk); #ifdef CONFIG_QOS_SCHED_DYNAMIC_AFFINITY - if (dynamic_affinity_enabled()) - sched_prefer_cpus_free(tsk); + sched_prefer_cpus_free(tsk); #endif #ifdef CONFIG_QOS_SCHED_SMART_GRID if (smart_grid_enabled()) @@ -2451,11 +2450,9 @@ __latent_entropy struct task_struct *copy_process( #endif #ifdef CONFIG_QOS_SCHED_DYNAMIC_AFFINITY - if (dynamic_affinity_enabled()) { - retval = sched_prefer_cpus_fork(p, current->prefer_cpus); - if (retval) - goto bad_fork_free; - } + retval = sched_prefer_cpus_fork(p, current->prefer_cpus); + if (retval) + goto bad_fork_free; #endif lockdep_assert_irqs_enabled(); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 1b497efc763b..fab904f44c87 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -12142,9 +12142,6 @@ static int __set_prefer_cpus_ptr(struct task_struct *p, struct rq *rq; int ret = 0; - if (!dynamic_affinity_enabled()) - return -EPERM; - if (unlikely(!p->prefer_cpus)) return -EINVAL; diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 7a9e6896c699..793019869da9 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -7,6 +7,10 @@ * Copyright(C) 2007, Red Hat, Inc., Ingo Molnar */ +#ifdef CONFIG_SCHED_PARAL +#include <asm/prefer_numa.h> +#endif + /* * This allows printing both to /proc/sched_debug and * to the console @@ -95,6 +99,39 @@ static void sched_feat_disable(int i) { }; static void sched_feat_enable(int i) { }; #endif /* CONFIG_JUMP_LABEL */ +#ifdef CONFIG_SCHED_PARAL +static void sched_feat_disable_paral(char *cmp) +{ + struct task_struct *tsk, *t; + + if (strncmp(cmp, "PARAL", 5) == 0) { + read_lock(&tasklist_lock); + for_each_process(tsk) { + if (tsk->flags & PF_KTHREAD || is_global_init(tsk)) + continue; + + for_each_thread(tsk, t) + cpumask_clear(t->prefer_cpus); + } + read_unlock(&tasklist_lock); + } +} + +static bool sched_feat_enable_paral(char *cmp) +{ + if (strncmp(cmp, "PARAL", 5) != 0) + return true; + + if (probe_pmu_numa_event() != 0) + return false; + + return true; +} +#else +static void sched_feat_disable_paral(char *cmp) {}; +static bool sched_feat_enable_paral(char *cmp) { return true; }; +#endif /* CONFIG_SCHED_PARAL */ + static int sched_feat_set(char *cmp) { int i; @@ -111,8 +148,12 @@ static int sched_feat_set(char *cmp) if (neg) { sysctl_sched_features &= ~(1UL << i); + sched_feat_disable_paral(cmp); sched_feat_disable(i); } else { + if (!sched_feat_enable_paral(cmp)) + return -EPERM; + sysctl_sched_features |= (1UL << i); sched_feat_enable(i); } @@ -1045,7 +1086,7 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns, P_SCHEDSTAT(nr_wakeups_passive); P_SCHEDSTAT(nr_wakeups_idle); #ifdef CONFIG_QOS_SCHED_DYNAMIC_AFFINITY - if (dynamic_affinity_enabled()) { + if (dynamic_affinity_enabled() || sched_paral_used()) { P_SCHEDSTAT(nr_wakeups_preferred_cpus); P_SCHEDSTAT(nr_wakeups_force_preferred_cpus); } diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 71661d6c5b54..8a32d0ac4a8b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -75,6 +75,10 @@ #endif #include <linux/sched/grid_qos.h> +#ifdef CONFIG_SCHED_PARAL +#include <asm/prefer_numa.h> +#endif + /* * The initial- and re-scaling of tunables is configurable * @@ -9057,6 +9061,12 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) } #ifdef CONFIG_QOS_SCHED_DYNAMIC_AFFINITY +#ifdef CONFIG_SCHED_PARAL +bool sched_paral_used(void) +{ + return sched_feat(PARAL); +} +#endif DEFINE_STATIC_KEY_FALSE(__dynamic_affinity_switch); @@ -9084,16 +9094,15 @@ __setup("dynamic_affinity=", dynamic_affinity_switch_setup); static inline bool prefer_cpus_valid(struct task_struct *p) { - struct cpumask *prefer_cpus; + struct cpumask *prefer_cpus = task_prefer_cpus(p); - if (!dynamic_affinity_enabled()) - return false; - - prefer_cpus = task_prefer_cpus(p); + if (dynamic_affinity_enabled() || sched_paral_used()) { + return !cpumask_empty(prefer_cpus) && + !cpumask_equal(prefer_cpus, p->cpus_ptr) && + cpumask_subset(prefer_cpus, p->cpus_ptr); + } - return !cpumask_empty(prefer_cpus) && - !cpumask_equal(prefer_cpus, p->cpus_ptr) && - cpumask_subset(prefer_cpus, p->cpus_ptr); + return false; } static inline unsigned long taskgroup_cpu_util(struct task_group *tg, @@ -9193,6 +9202,14 @@ static void set_task_select_cpus(struct task_struct *p, int *idlest_cpu, } rcu_read_unlock(); + /* In extreme cases, it may cause uneven system load. */ + if (sched_paral_used() && sysctl_sched_util_low_pct == 100 && nr_cpus_valid > 0) { + p->select_cpus = p->prefer_cpus; + if (sd_flag & SD_BALANCE_WAKE) + schedstat_inc(p->stats.nr_wakeups_preferred_cpus); + return; + } + /* * Follow cases should select cpus_ptr, checking by condition of * tg_capacity > nr_cpus_valid: @@ -14679,6 +14696,12 @@ static void task_fork_fair(struct task_struct *p) if (curr) update_curr(cfs_rq); place_entity(cfs_rq, se, ENQUEUE_INITIAL); + +#ifdef CONFIG_SCHED_PARAL + if (sched_paral_used()) + set_task_paral_node(p); +#endif + rq_unlock(rq, &rf); } diff --git a/kernel/sched/features.h b/kernel/sched/features.h index ea7ba74810e3..67939d04542f 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -61,6 +61,10 @@ SCHED_FEAT(SIS_UTIL, true) SCHED_FEAT(STEAL, false) #endif +#ifdef CONFIG_SCHED_PARAL +SCHED_FEAT(PARAL, false) +#endif + /* * Issue a WARN when we do multiple update_rq_clock() calls * in a single rq->lock section. Default disabled because the -- 2.25.1
2 1
0 0
[PATCH OLK-6.6] serial: jsm: fix NPE during jsm_uart_port_init
by Yi Yang 30 Jul '25

30 Jul '25
From: Dustin Lundquist <dustin(a)null-ptr.net> stable inclusion from stable-v6.6.94 commit 3258d7ff8ebfa451426662b23e8f2b51b129afe1 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/ICLHPF CVE: CVE-2025-38265 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- commit e3975aa899c0a3bbc10d035e699b142cd1373a71 upstream. No device was set which caused serial_base_ctrl_add to crash. BUG: kernel NULL pointer dereference, address: 0000000000000050 Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 16 UID: 0 PID: 368 Comm: (udev-worker) Not tainted 6.12.25-amd64 #1 Debian 6.12.25-1 RIP: 0010:serial_base_ctrl_add+0x96/0x120 Call Trace: <TASK> serial_core_register_port+0x1a0/0x580 ? __setup_irq+0x39c/0x660 ? __kmalloc_cache_noprof+0x111/0x310 jsm_uart_port_init+0xe8/0x180 [jsm] jsm_probe_one+0x1f4/0x410 [jsm] local_pci_probe+0x42/0x90 pci_device_probe+0x22f/0x270 really_probe+0xdb/0x340 ? pm_runtime_barrier+0x54/0x90 ? __pfx___driver_attach+0x10/0x10 __driver_probe_device+0x78/0x110 driver_probe_device+0x1f/0xa0 __driver_attach+0xba/0x1c0 bus_for_each_dev+0x8c/0xe0 bus_add_driver+0x112/0x1f0 driver_register+0x72/0xd0 jsm_init_module+0x36/0xff0 [jsm] ? __pfx_jsm_init_module+0x10/0x10 [jsm] do_one_initcall+0x58/0x310 do_init_module+0x60/0x230 Tested with Digi Neo PCIe 8 port card. Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") Cc: stable <stable(a)kernel.org> Signed-off-by: Dustin Lundquist <dustin(a)null-ptr.net> Link: https://lore.kernel.org/r/3f31d4f75863614655c4673027a208be78d022ec.camel@nu… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Yi Yang <yiyang13(a)huawei.com> --- drivers/tty/serial/jsm/jsm_tty.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/tty/serial/jsm/jsm_tty.c b/drivers/tty/serial/jsm/jsm_tty.c index 222afc270c88..1bee624bd484 100644 --- a/drivers/tty/serial/jsm/jsm_tty.c +++ b/drivers/tty/serial/jsm/jsm_tty.c @@ -451,6 +451,7 @@ int jsm_uart_port_init(struct jsm_board *brd) if (!brd->channels[i]) continue; + brd->channels[i]->uart_port.dev = &brd->pci_dev->dev; brd->channels[i]->uart_port.irq = brd->irq; brd->channels[i]->uart_port.uartclk = 14745600; brd->channels[i]->uart_port.type = PORT_JSM; -- 2.25.1
2 1
0 0
[PATCH OLK-5.10] scsi: target: Fix NULL pointer dereference in core_scsi3_decode_spec_i_port()
by Yifan Qiao 30 Jul '25

30 Jul '25
From: Maurizio Lombardi <mlombard(a)redhat.com> mainline inclusion from mainline-v6.16-rc3 commit d8ab68bdb294b09a761e967dad374f2965e1913f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ICOXB6 CVE: CVE-2025-38399 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- The function core_scsi3_decode_spec_i_port(), in its error code path, unconditionally calls core_scsi3_lunacl_undepend_item() passing the dest_se_deve pointer, which may be NULL. This can lead to a NULL pointer dereference if dest_se_deve remains unset. SPC-3 PR SPEC_I_PT: Unable to locate dest_tpg Unable to handle kernel paging request at virtual address dfff800000000012 Call trace: core_scsi3_lunacl_undepend_item+0x2c/0xf0 [target_core_mod] (P) core_scsi3_decode_spec_i_port+0x120c/0x1c30 [target_core_mod] core_scsi3_emulate_pro_register+0x6b8/0xcd8 [target_core_mod] target_scsi3_emulate_pr_out+0x56c/0x840 [target_core_mod] Fix this by adding a NULL check before calling core_scsi3_lunacl_undepend_item() Signed-off-by: Maurizio Lombardi <mlombard(a)redhat.com> Link: https://lore.kernel.org/r/20250612101556.24829-1-mlombard@redhat.com Reviewed-by: Mike Christie <michael.christie(a)oracle.com> Reviewed-by: John Meneghini <jmeneghi(a)redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com> Signed-off-by: Yifan Qiao <qiaoyifan4(a)huawei.com> --- drivers/target/target_core_pr.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/target/target_core_pr.c b/drivers/target/target_core_pr.c index b42193c554fb..2bc849799739 100644 --- a/drivers/target/target_core_pr.c +++ b/drivers/target/target_core_pr.c @@ -1858,7 +1858,9 @@ core_scsi3_decode_spec_i_port( } kmem_cache_free(t10_pr_reg_cache, dest_pr_reg); - core_scsi3_lunacl_undepend_item(dest_se_deve); + + if (dest_se_deve) + core_scsi3_lunacl_undepend_item(dest_se_deve); if (is_local) continue; -- 2.39.2
2 1
0 0
[PATCH OLK-6.6] scsi: target: Fix NULL pointer dereference in core_scsi3_decode_spec_i_port()
by Yifan Qiao 30 Jul '25

30 Jul '25
From: Maurizio Lombardi <mlombard(a)redhat.com> mainline inclusion from mainline-v6.16-rc3 commit d8ab68bdb294b09a761e967dad374f2965e1913f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/ICOXB6 CVE: CVE-2025-38399 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- The function core_scsi3_decode_spec_i_port(), in its error code path, unconditionally calls core_scsi3_lunacl_undepend_item() passing the dest_se_deve pointer, which may be NULL. This can lead to a NULL pointer dereference if dest_se_deve remains unset. SPC-3 PR SPEC_I_PT: Unable to locate dest_tpg Unable to handle kernel paging request at virtual address dfff800000000012 Call trace: core_scsi3_lunacl_undepend_item+0x2c/0xf0 [target_core_mod] (P) core_scsi3_decode_spec_i_port+0x120c/0x1c30 [target_core_mod] core_scsi3_emulate_pro_register+0x6b8/0xcd8 [target_core_mod] target_scsi3_emulate_pr_out+0x56c/0x840 [target_core_mod] Fix this by adding a NULL check before calling core_scsi3_lunacl_undepend_item() Signed-off-by: Maurizio Lombardi <mlombard(a)redhat.com> Link: https://lore.kernel.org/r/20250612101556.24829-1-mlombard@redhat.com Reviewed-by: Mike Christie <michael.christie(a)oracle.com> Reviewed-by: John Meneghini <jmeneghi(a)redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com> Signed-off-by: Yifan Qiao <qiaoyifan4(a)huawei.com> --- drivers/target/target_core_pr.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/target/target_core_pr.c b/drivers/target/target_core_pr.c index 49d9167bb263..a9eb6a3e8383 100644 --- a/drivers/target/target_core_pr.c +++ b/drivers/target/target_core_pr.c @@ -1841,7 +1841,9 @@ core_scsi3_decode_spec_i_port( } kmem_cache_free(t10_pr_reg_cache, dest_pr_reg); - core_scsi3_lunacl_undepend_item(dest_se_deve); + + if (dest_se_deve) + core_scsi3_lunacl_undepend_item(dest_se_deve); if (is_local) continue; -- 2.39.2
2 1
0 0
[PATCH OLK-6.6] mm/dynamic_pool: Fix free_huge_pages undefflow problem
by Wupeng Ma 30 Jul '25

30 Jul '25
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8S9BY CVE: NA -------------------------------- With mmap with flag MAP_NORESERVE, resv_huge_pages will not be checked during mmap, this may allow user to mmap for more huge pages than are currently available, without failing at the mmap stage. During actual allocation, if free_huge_pages is zero while pool->freelist remains non-zero (as it contains splittable huge pages), this scenario will trigger an underflow of free_huge_pages. To Fix this, check free_huge_pages before allocate hugepages. Fixes: 8ce9d44df8ec ("mm/dynamic_pool: support HugeTLB page allocation from dpool") Signed-off-by: Wupeng Ma <mawupeng1(a)huawei.com> --- mm/dynamic_pool.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/dynamic_pool.c b/mm/dynamic_pool.c index 063fb6092401..cdd97bdfc8ac 100644 --- a/mm/dynamic_pool.c +++ b/mm/dynamic_pool.c @@ -919,6 +919,9 @@ struct folio *dynamic_pool_alloc_hugepage(struct hugetlbfs_inode_info *p, if (!dpool->online) goto unlock; + if (!pool->free_huge_pages) + goto unlock; + list_for_each_entry(folio, &pool->freelist, lru) { if (folio_test_hwpoison(folio)) continue; -- 2.43.0
2 1
0 0
[PATCH OLK-5.10] scsi: lpfc: Use memcpy() for BIOS version
by Yi Yang 30 Jul '25

30 Jul '25
From: Daniel Wagner <wagi(a)kernel.org> stable inclusion from stable-v5.10.239 commit b699bda5db818b684ff62d140defd6394f38f3d6 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/ICLHXS CVE: CVE-2025-38332 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit ae82eaf4aeea060bb736c3e20c0568b67c701d7d ] The strlcat() with FORTIFY support is triggering a panic because it thinks the target buffer will overflow although the correct target buffer size is passed in. Anyway, instead of memset() with 0 followed by a strlcat(), just use memcpy() and ensure that the resulting buffer is NULL terminated. BIOSVersion is only used for the lpfc_printf_log() which expects a properly terminated string. Signed-off-by: Daniel Wagner <wagi(a)kernel.org> Link: https://lore.kernel.org/r/20250409-fix-lpfc-bios-str-v1-1-05dac9e51e13@kern… Reviewed-by: Justin Tee <justin.tee(a)broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Yi Yang <yiyang13(a)huawei.com> --- drivers/scsi/lpfc/lpfc_sli.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index 1f1d41a9cdfa..f6ce86759ffe 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -5543,9 +5543,9 @@ lpfc_sli4_get_ctl_attr(struct lpfc_hba *phba) phba->sli4_hba.lnk_info.lnk_no = bf_get(lpfc_cntl_attr_lnk_numb, cntl_attr); - memset(phba->BIOSVersion, 0, sizeof(phba->BIOSVersion)); - strlcat(phba->BIOSVersion, (char *)cntl_attr->bios_ver_str, + memcpy(phba->BIOSVersion, cntl_attr->bios_ver_str, sizeof(phba->BIOSVersion)); + phba->BIOSVersion[sizeof(phba->BIOSVersion) - 1] = '\0'; lpfc_printf_log(phba, KERN_INFO, LOG_SLI, "3086 lnk_type:%d, lnk_numb:%d, bios_ver:%s\n", -- 2.25.1
2 1
0 0
[PATCH OLK-6.6] bnxt: properly flush XDP redirect lists
by Fanhua Li 30 Jul '25

30 Jul '25
From: Yan Zhai <yan(a)cloudflare.com> stable inclusion from stable-v6.6.97 commit 16254aa985d14dee050564c4a3936f3dc096e1f7 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/ICL7XZ CVE: CVE-2025-38246 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 9caca6ac0e26cd20efd490d8b3b2ffb1c7c00f6f ] We encountered following crash when testing a XDP_REDIRECT feature in production: [56251.579676] list_add corruption. next->prev should be prev (ffff93120dd40f30), but was ffffb301ef3a6740. (next=ffff93120dd 40f30). [56251.601413] ------------[ cut here ]------------ [56251.611357] kernel BUG at lib/list_debug.c:29! [56251.621082] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [56251.632073] CPU: 111 UID: 0 PID: 0 Comm: swapper/111 Kdump: loaded Tainted: P O 6.12.33-cloudflare-2025.6. 3 #1 [56251.653155] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE [56251.663877] Hardware name: MiTAC GC68B-B8032-G11P6-GPU/S8032GM-HE-CFR, BIOS V7.020.B10-sig 01/22/2025 [56251.682626] RIP: 0010:__list_add_valid_or_report+0x4b/0xa0 [56251.693203] Code: 0e 48 c7 c7 68 e7 d9 97 e8 42 16 fe ff 0f 0b 48 8b 52 08 48 39 c2 74 14 48 89 f1 48 c7 c7 90 e7 d9 97 48 89 c6 e8 25 16 fe ff <0f> 0b 4c 8b 02 49 39 f0 74 14 48 89 d1 48 c7 c7 e8 e7 d9 97 4c 89 [56251.725811] RSP: 0018:ffff93120dd40b80 EFLAGS: 00010246 [56251.736094] RAX: 0000000000000075 RBX: ffffb301e6bba9d8 RCX: 0000000000000000 [56251.748260] RDX: 0000000000000000 RSI: ffff9149afda0b80 RDI: ffff9149afda0b80 [56251.760349] RBP: ffff9131e49c8000 R08: 0000000000000000 R09: ffff93120dd40a18 [56251.772382] R10: ffff9159cf2ce1a8 R11: 0000000000000003 R12: ffff911a80850000 [56251.784364] R13: ffff93120fbc7000 R14: 0000000000000010 R15: ffff9139e7510e40 [56251.796278] FS: 0000000000000000(0000) GS:ffff9149afd80000(0000) knlGS:0000000000000000 [56251.809133] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [56251.819561] CR2: 00007f5e85e6f300 CR3: 00000038b85e2006 CR4: 0000000000770ef0 [56251.831365] PKRU: 55555554 [56251.838653] Call Trace: [56251.845560] <IRQ> [56251.851943] cpu_map_enqueue.cold+0x5/0xa [56251.860243] xdp_do_redirect+0x2d9/0x480 [56251.868388] bnxt_rx_xdp+0x1d8/0x4c0 [bnxt_en] [56251.877028] bnxt_rx_pkt+0x5f7/0x19b0 [bnxt_en] [56251.885665] ? cpu_max_write+0x1e/0x100 [56251.893510] ? srso_alias_return_thunk+0x5/0xfbef5 [56251.902276] __bnxt_poll_work+0x190/0x340 [bnxt_en] [56251.911058] bnxt_poll+0xab/0x1b0 [bnxt_en] [56251.919041] ? srso_alias_return_thunk+0x5/0xfbef5 [56251.927568] ? srso_alias_return_thunk+0x5/0xfbef5 [56251.935958] ? srso_alias_return_thunk+0x5/0xfbef5 [56251.944250] __napi_poll+0x2b/0x160 [56251.951155] bpf_trampoline_6442548651+0x79/0x123 [56251.959262] __napi_poll+0x5/0x160 [56251.966037] net_rx_action+0x3d2/0x880 [56251.973133] ? srso_alias_return_thunk+0x5/0xfbef5 [56251.981265] ? srso_alias_return_thunk+0x5/0xfbef5 [56251.989262] ? __hrtimer_run_queues+0x162/0x2a0 [56251.996967] ? srso_alias_return_thunk+0x5/0xfbef5 [56252.004875] ? srso_alias_return_thunk+0x5/0xfbef5 [56252.012673] ? bnxt_msix+0x62/0x70 [bnxt_en] [56252.019903] handle_softirqs+0xcf/0x270 [56252.026650] irq_exit_rcu+0x67/0x90 [56252.032933] common_interrupt+0x85/0xa0 [56252.039498] </IRQ> [56252.044246] <TASK> [56252.048935] asm_common_interrupt+0x26/0x40 [56252.055727] RIP: 0010:cpuidle_enter_state+0xb8/0x420 [56252.063305] Code: dc 01 00 00 e8 f9 79 3b ff e8 64 f7 ff ff 49 89 c5 0f 1f 44 00 00 31 ff e8 a5 32 3a ff 45 84 ff 0f 85 ae 01 00 00 fb 45 85 f6 <0f> 88 88 01 00 00 48 8b 04 24 49 63 ce 4c 89 ea 48 6b f1 68 48 29 [56252.088911] RSP: 0018:ffff93120c97fe98 EFLAGS: 00000202 [56252.096912] RAX: ffff9149afd80000 RBX: ffff9141d3a72800 RCX: 0000000000000000 [56252.106844] RDX: 00003329176c6b98 RSI: ffffffe36db3fdc7 RDI: 0000000000000000 [56252.116733] RBP: 0000000000000002 R08: 0000000000000002 R09: 000000000000004e [56252.126652] R10: ffff9149afdb30c4 R11: 071c71c71c71c71c R12: ffffffff985ff860 [56252.136637] R13: 00003329176c6b98 R14: 0000000000000002 R15: 0000000000000000 [56252.146667] ? cpuidle_enter_state+0xab/0x420 [56252.153909] cpuidle_enter+0x2d/0x40 [56252.160360] do_idle+0x176/0x1c0 [56252.166456] cpu_startup_entry+0x29/0x30 [56252.173248] start_secondary+0xf7/0x100 [56252.179941] common_startup_64+0x13e/0x141 [56252.186886] </TASK> From the crash dump, we found that the cpu_map_flush_list inside redirect info is partially corrupted: its list_head->next points to itself, but list_head->prev points to a valid list of unflushed bq entries. This turned out to be a result of missed XDP flush on redirect lists. By digging in the actual source code, we found that commit 7f0a168b0441 ("bnxt_en: Add completion ring pointer in TX and RX ring structures") incorrectly overwrites the event mask for XDP_REDIRECT in bnxt_rx_xdp. We can stably reproduce this crash by returning XDP_TX and XDP_REDIRECT randomly for incoming packets in a naive XDP program. Properly propagate the XDP_REDIRECT events back fixes the crash. Fixes: a7559bc8c17c ("bnxt: support transmit and free of aggregation buffers") Tested-by: Andrew Rzeznik <arzeznik(a)cloudflare.com> Signed-off-by: Yan Zhai <yan(a)cloudflare.com> Acked-by: Jesper Dangaard Brouer <hawk(a)kernel.org> Reviewed-by: Michael Chan <michael.chan(a)broadcom.com> Reviewed-by: Andy Gospodarek <gospo(a)broadcom.com> Link: https://patch.msgid.link/aFl7jpCNzscumuN2@debian.debian Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Fanhua Li <lifanhua5(a)huawei.com> --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index bb264ea2e914..83b0b596dcc4 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -2499,6 +2499,7 @@ static int __bnxt_poll_work(struct bnxt *bp, struct bnxt_cp_ring_info *cpr, { struct bnxt_napi *bnapi = cpr->bnapi; u32 raw_cons = cpr->cp_raw_cons; + bool flush_xdp = false; u32 cons; int tx_pkts = 0; int rx_pkts = 0; @@ -2536,6 +2537,8 @@ static int __bnxt_poll_work(struct bnxt *bp, struct bnxt_cp_ring_info *cpr, else rc = bnxt_force_rx_discard(bp, cpr, &raw_cons, &event); + if (event & BNXT_REDIRECT_EVENT) + flush_xdp = true; if (likely(rc >= 0)) rx_pkts += rc; /* Increment rx_pkts when rc is -ENOMEM to count towards @@ -2563,8 +2566,10 @@ static int __bnxt_poll_work(struct bnxt *bp, struct bnxt_cp_ring_info *cpr, } } - if (event & BNXT_REDIRECT_EVENT) + if (flush_xdp) { xdp_do_flush(); + event &= ~BNXT_REDIRECT_EVENT; + } if (event & BNXT_TX_EVENT) { struct bnxt_tx_ring_info *txr = bnapi->tx_ring; -- 2.43.0
2 1
0 0
[PATCH openEuler-1.0-LTS] calipso: Fix null-ptr-deref in calipso_req_{set,del}attr().
by Fanhua Li 30 Jul '25

30 Jul '25
From: Kuniyuki Iwashima <kuniyu(a)google.com> mainline inclusion from mainline-v6.16-rc3 commit 10876da918fa1aec0227fb4c67647513447f53a9 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/ICK4NZ CVE: CVE-2025-38181 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- syzkaller reported a null-ptr-deref in sock_omalloc() while allocating a CALIPSO option. [0] The NULL is of struct sock, which was fetched by sk_to_full_sk() in calipso_req_setattr(). Since commit a1a5344ddbe8 ("tcp: avoid two atomic ops for syncookies"), reqsk->rsk_listener could be NULL when SYN Cookie is returned to its client, as hinted by the leading SYN Cookie log. Here are 3 options to fix the bug: 1) Return 0 in calipso_req_setattr() 2) Return an error in calipso_req_setattr() 3) Alaways set rsk_listener 1) is no go as it bypasses LSM, but 2) effectively disables SYN Cookie for CALIPSO. 3) is also no go as there have been many efforts to reduce atomic ops and make TCP robust against DDoS. See also commit 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood"). As of the blamed commit, SYN Cookie already did not need refcounting, and no one has stumbled on the bug for 9 years, so no CALIPSO user will care about SYN Cookie. Let's return an error in calipso_req_setattr() and calipso_req_delattr() in the SYN Cookie case. This can be reproduced by [1] on Fedora and now connect() of nc times out. [0]: TCP: request_sock_TCPv6: Possible SYN flooding on port [::]:20002. Sending cookies. Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] PREEMPT SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] CPU: 3 UID: 0 PID: 12262 Comm: syz.1.2611 Not tainted 6.14.0 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:read_pnet include/net/net_namespace.h:406 [inline] RIP: 0010:sock_net include/net/sock.h:655 [inline] RIP: 0010:sock_kmalloc+0x35/0x170 net/core/sock.c:2806 Code: 89 d5 41 54 55 89 f5 53 48 89 fb e8 25 e3 c6 fd e8 f0 91 e3 00 48 8d 7b 30 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 26 01 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8b RSP: 0018:ffff88811af89038 EFLAGS: 00010216 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffff888105266400 RDX: 0000000000000006 RSI: ffff88800c890000 RDI: 0000000000000030 RBP: 0000000000000050 R08: 0000000000000000 R09: ffff88810526640e R10: ffffed1020a4cc81 R11: ffff88810526640f R12: 0000000000000000 R13: 0000000000000820 R14: ffff888105266400 R15: 0000000000000050 FS: 00007f0653a07640(0000) GS:ffff88811af80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f863ba096f4 CR3: 00000000163c0005 CR4: 0000000000770ef0 PKRU: 80000000 Call Trace: <IRQ> ipv6_renew_options+0x279/0x950 net/ipv6/exthdrs.c:1288 calipso_req_setattr+0x181/0x340 net/ipv6/calipso.c:1204 calipso_req_setattr+0x56/0x80 net/netlabel/netlabel_calipso.c:597 netlbl_req_setattr+0x18a/0x440 net/netlabel/netlabel_kapi.c:1249 selinux_netlbl_inet_conn_request+0x1fb/0x320 security/selinux/netlabel.c:342 selinux_inet_conn_request+0x1eb/0x2c0 security/selinux/hooks.c:5551 security_inet_conn_request+0x50/0xa0 security/security.c:4945 tcp_v6_route_req+0x22c/0x550 net/ipv6/tcp_ipv6.c:825 tcp_conn_request+0xec8/0x2b70 net/ipv4/tcp_input.c:7275 tcp_v6_conn_request+0x1e3/0x440 net/ipv6/tcp_ipv6.c:1328 tcp_rcv_state_process+0xafa/0x52b0 net/ipv4/tcp_input.c:6781 tcp_v6_do_rcv+0x8a6/0x1a40 net/ipv6/tcp_ipv6.c:1667 tcp_v6_rcv+0x505e/0x5b50 net/ipv6/tcp_ipv6.c:1904 ip6_protocol_deliver_rcu+0x17c/0x1da0 net/ipv6/ip6_input.c:436 ip6_input_finish+0x103/0x180 net/ipv6/ip6_input.c:480 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip6_input+0x13c/0x6b0 net/ipv6/ip6_input.c:491 dst_input include/net/dst.h:469 [inline] ip6_rcv_finish net/ipv6/ip6_input.c:79 [inline] ip6_rcv_finish+0xb6/0x490 net/ipv6/ip6_input.c:69 NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ipv6_rcv+0xf9/0x490 net/ipv6/ip6_input.c:309 __netif_receive_skb_one_core+0x12e/0x1f0 net/core/dev.c:5896 __netif_receive_skb+0x1d/0x170 net/core/dev.c:6009 process_backlog+0x41e/0x13b0 net/core/dev.c:6357 __napi_poll+0xbd/0x710 net/core/dev.c:7191 napi_poll net/core/dev.c:7260 [inline] net_rx_action+0x9de/0xde0 net/core/dev.c:7382 handle_softirqs+0x19a/0x770 kernel/softirq.c:561 do_softirq.part.0+0x36/0x70 kernel/softirq.c:462 </IRQ> <TASK> do_softirq arch/x86/include/asm/preempt.h:26 [inline] __local_bh_enable_ip+0xf1/0x110 kernel/softirq.c:389 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline] __dev_queue_xmit+0xc2a/0x3c40 net/core/dev.c:4679 dev_queue_xmit include/linux/netdevice.h:3313 [inline] neigh_hh_output include/net/neighbour.h:523 [inline] neigh_output include/net/neighbour.h:537 [inline] ip6_finish_output2+0xd69/0x1f80 net/ipv6/ip6_output.c:141 __ip6_finish_output net/ipv6/ip6_output.c:215 [inline] ip6_finish_output+0x5dc/0xd60 net/ipv6/ip6_output.c:226 NF_HOOK_COND include/linux/netfilter.h:303 [inline] ip6_output+0x24b/0x8d0 net/ipv6/ip6_output.c:247 dst_output include/net/dst.h:459 [inline] NF_HOOK include/linux/netfilter.h:314 [inline] NF_HOOK include/linux/netfilter.h:308 [inline] ip6_xmit+0xbbc/0x20d0 net/ipv6/ip6_output.c:366 inet6_csk_xmit+0x39a/0x720 net/ipv6/inet6_connection_sock.c:135 __tcp_transmit_skb+0x1a7b/0x3b40 net/ipv4/tcp_output.c:1471 tcp_transmit_skb net/ipv4/tcp_output.c:1489 [inline] tcp_send_syn_data net/ipv4/tcp_output.c:4059 [inline] tcp_connect+0x1c0c/0x4510 net/ipv4/tcp_output.c:4148 tcp_v6_connect+0x156c/0x2080 net/ipv6/tcp_ipv6.c:333 __inet_stream_connect+0x3a7/0xed0 net/ipv4/af_inet.c:677 tcp_sendmsg_fastopen+0x3e2/0x710 net/ipv4/tcp.c:1039 tcp_sendmsg_locked+0x1e82/0x3570 net/ipv4/tcp.c:1091 tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1358 inet6_sendmsg+0xb9/0x150 net/ipv6/af_inet6.c:659 sock_sendmsg_nosec net/socket.c:718 [inline] __sock_sendmsg+0xf4/0x2a0 net/socket.c:733 __sys_sendto+0x29a/0x390 net/socket.c:2187 __do_sys_sendto net/socket.c:2194 [inline] __se_sys_sendto net/socket.c:2190 [inline] __x64_sys_sendto+0xe1/0x1c0 net/socket.c:2190 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xc3/0x1d0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f06553c47ed Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f0653a06fc8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00007f0655605fa0 RCX: 00007f06553c47ed RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000b RBP: 00007f065545db38 R08: 0000200000000140 R09: 000000000000001c R10: f7384d4ea84b01bd R11: 0000000000000246 R12: 0000000000000000 R13: 00007f0655605fac R14: 00007f0655606038 R15: 00007f06539e7000 </TASK> Modules linked in: [1]: dnf install -y selinux-policy-targeted policycoreutils netlabel_tools procps-ng nmap-ncat mount -t selinuxfs none /sys/fs/selinux load_policy netlabelctl calipso add pass doi:1 netlabelctl map del default netlabelctl map add default address:::1 protocol:calipso,1 sysctl net.ipv4.tcp_syncookies=2 nc -l ::1 80 & nc ::1 80 Fixes: e1adea927080 ("calipso: Allow request sockets to be relabelled by the lsm.") Reported-by: syzkaller <syzkaller(a)googlegroups.com> Reported-by: John Cheung <john.cs.hey(a)gmail.com> Closes: https://lore.kernel.org/netdev/CAP=Rh=MvfhrGADy+-WJiftV2_WzMH4VEhEFmeT28qY+… Signed-off-by: Kuniyuki Iwashima <kuniyu(a)google.com> Acked-by: Paul Moore <paul(a)paul-moore.com> Link: https://patch.msgid.link/20250617224125.17299-1-kuni1840@gmail.com Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Fanhua Li <lifanhua5(a)huawei.com> --- net/ipv6/calipso.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/net/ipv6/calipso.c b/net/ipv6/calipso.c index afc76062e1a1..12e68067343d 100644 --- a/net/ipv6/calipso.c +++ b/net/ipv6/calipso.c @@ -1209,6 +1209,10 @@ static int calipso_req_setattr(struct request_sock *req, struct ipv6_opt_hdr *old, *new; struct sock *sk = sk_to_full_sk(req_to_sk(req)); + /* sk is NULL for SYN+ACK w/ SYN Cookie */ + if (!sk) + return -ENOMEM; + if (req_inet->ipv6_opt && req_inet->ipv6_opt->hopopt) old = req_inet->ipv6_opt->hopopt; else @@ -1249,6 +1253,10 @@ static void calipso_req_delattr(struct request_sock *req) struct ipv6_txoptions *txopts; struct sock *sk = sk_to_full_sk(req_to_sk(req)); + /* sk is NULL for SYN+ACK w/ SYN Cookie */ + if (!sk) + return; + if (!req_inet->ipv6_opt || !req_inet->ipv6_opt->hopopt) return; -- 2.43.0
2 1
0 0
  • ← Newer
  • 1
  • 2
  • 3
  • 4
  • ...
  • 1950
  • Older →

HyperKitty Powered by HyperKitty