December 2023 - Kernel - mailweb.openeuler.org

[PATCH openEuler-1.0-LTS v2] lib/notifier-error-inject: fix error when writing -errno to debugfs file
by Zhao Wenhui 14 Dec '23

14 Dec '23

From: Akinobu Mita <akinobu.mita(a)gmail.com> mainline inclusion from mainline-v6.2-rc1 commit f883c3edd2c432a2931ec8773c70a570115a50fe category: bugfix bugzilla: 189394,https://gitee.com/openeuler/kernel/issues/I8KYFI CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- The simple attribute files do not accept a negative value since the commit 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()"). This restores the previous behaviour by using newly introduced DEFINE_SIMPLE_ATTRIBUTE_SIGNED instead of DEFINE_SIMPLE_ATTRIBUTE. Link: https://lkml.kernel.org/r/20220919172418.45257-3-akinobu.mita@gmail.com Fixes: 488dac0c9237 ("libfs: fix error cast of negative value in simple_attr_write()") Signed-off-by: Akinobu Mita <akinobu.mita(a)gmail.com> Reported-by: Zhao Gongyi <zhaogongyi(a)huawei.com> Reviewed-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: Alexander Viro <viro(a)zeniv.linux.org.uk> Cc: Jonathan Corbet <corbet(a)lwn.net> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: Rafael J. Wysocki <rafael(a)kernel.org> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Wei Yongjun <weiyongjun1(a)huawei.com> Cc: Yicong Yang <yangyicong(a)hisilicon.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Zhao Wenhui <zhaowenhui8(a)huawei.com> --- lib/notifier-error-inject.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/notifier-error-inject.c b/lib/notifier-error-inject.c index eb4a04afea80..125ea8ce23a4 100644 --- a/lib/notifier-error-inject.c +++ b/lib/notifier-error-inject.c @@ -14,7 +14,7 @@ static int debugfs_errno_get(void *data, u64 *val) return 0; } -DEFINE_SIMPLE_ATTRIBUTE(fops_errno, debugfs_errno_get, debugfs_errno_set, +DEFINE_SIMPLE_ATTRIBUTE_SIGNED(fops_errno, debugfs_errno_get, debugfs_errno_set, "%lld\n"); static struct dentry *debugfs_create_errno(const char *name, umode_t mode, -- 2.34.1

2 1

Re: [PATCH] memcg: support ksm merge any mode per cgroup
by Kefeng Wang 14 Dec '23

14 Dec '23

这个地方要不要新加一个逻辑：考虑后续新pid加入到cgroup里面情况；我们memcg->ksm接口只是做使能； 1）当前实现，遍历已有的使能ksm 2）新加入pid，需要查看遍历使能ksm On 2023/12/14 21:38, Nanyong Sun wrote: > hulk inclusion > category: feature > bugzilla: https://gitee.com/openeuler/kernel/issues/I8OIQR > > ---------------------------------------------------------------------- > > Add control file "memory.ksm" to enable ksm per cgroup. > Echo to 1 will set all tasks currently in the cgroup to ksm merge > any mode, which means ksm gets enabled for all vma's of a process. > Meanwhile echo to 0 will disable ksm for them and unmerge the > merged pages. > Cat the file will show the above state and ksm related profits > of this cgroup. > > Signed-off-by: Nanyong Sun <sunnanyong(a)huawei.com> > --- > .../admin-guide/cgroup-v1/memory.rst | 1 + > mm/memcontrol.c | 110 +++++++++++++++++- > 2 files changed, 109 insertions(+), 2 deletions(-) > > diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst > index ff456871bf4b..3fdb48435e8e 100644 > --- a/Documentation/admin-guide/cgroup-v1/memory.rst > +++ b/Documentation/admin-guide/cgroup-v1/memory.rst > @@ -109,6 +109,7 @@ Brief summary of control files. > memory.kmem.tcp.failcnt show the number of tcp buf memory usage > hits limits > memory.kmem.tcp.max_usage_in_bytes show max tcp buf memory usage recorded > + memory.ksm set/show ksm merge any mode > ==================================== ========================================== > > 1. History > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 8a881ab21f6c..be37c2dda785 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -73,6 +73,7 @@ > #include <linux/uaccess.h> > > #include <trace/events/vmscan.h> > +#include <linux/ksm.h> > > struct cgroup_subsys memory_cgrp_subsys __read_mostly; > EXPORT_SYMBOL(memory_cgrp_subsys); > @@ -230,10 +231,15 @@ enum res_type { > iter != NULL; \ > iter = mem_cgroup_iter(NULL, iter, NULL)) > > +static inline bool __task_is_dying(struct task_struct *task) > +{ > + return tsk_is_oom_victim(task) || fatal_signal_pending(task) || > + (task->flags & PF_EXITING); > +} > + > static inline bool task_is_dying(void) > { > - return tsk_is_oom_victim(current) || fatal_signal_pending(current) || > - (current->flags & PF_EXITING); > + return __task_is_dying(current); > } > > /* Some nice accessors for the vmpressure. */ > @@ -5010,6 +5016,98 @@ static int mem_cgroup_slab_show(struct seq_file *m, void *p) > } > #endif > > +#ifdef CONFIG_KSM > +static int memcg_set_ksm_for_tasks(struct mem_cgroup *memcg, bool enable) > +{ > + struct task_struct *task; > + struct mm_struct *mm; > + struct css_task_iter it; > + int ret = 0; > + > + css_task_iter_start(&memcg->css, CSS_TASK_ITER_PROCS, &it); > + while (!ret && (task = css_task_iter_next(&it))) { > + if (__task_is_dying(task)) > + continue; > + > + mm = get_task_mm(task); > + if (!mm) > + continue; > + > + if (mmap_write_lock_killable(mm)) { > + mmput(mm); > + continue; > + } > + > + if (enable) > + ret = ksm_enable_merge_any(mm); > + else > + ret = ksm_disable_merge_any(mm); > + > + mmap_write_unlock(mm); > + mmput(mm); > + } > + css_task_iter_end(&it); > + > + return ret; > +} > + > +static int memory_ksm_show(struct seq_file *m, void *v) > +{ > + unsigned long ksm_merging_pages = 0; > + unsigned long ksm_rmap_items = 0; > + long ksm_process_profits = 0; > + unsigned int tasks = 0; > + struct task_struct *task; > + struct mm_struct *mm; > + struct css_task_iter it; > + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); > + > + css_task_iter_start(&memcg->css, CSS_TASK_ITER_PROCS, &it); > + while ((task = css_task_iter_next(&it))) { > + mm = get_task_mm(task); > + if (!mm) > + continue; > + > + if (test_bit(MMF_VM_MERGE_ANY, &mm->flags)) > + tasks++; > + > + ksm_rmap_items += mm->ksm_rmap_items; > + ksm_merging_pages += mm->ksm_merging_pages; > + ksm_process_profits += ksm_process_profit(mm); > + mmput(mm); > + } > + css_task_iter_end(&it); > + > + seq_printf(m, "merge any tasks: %u\n", tasks); > + seq_printf(m, "ksm_rmap_items %lu\n", ksm_rmap_items); > + seq_printf(m, "ksm_merging_pages %lu\n", ksm_merging_pages); > + seq_printf(m, "ksm_process_profits %ld\n", ksm_process_profits); > + return 0; > +} > + > +static ssize_t memory_ksm_write(struct kernfs_open_file *of, char *buf, > + size_t nbytes, loff_t off) > +{ > + bool enable; > + int err; > + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); > + > + buf = strstrip(buf); > + if (!buf) > + return -EINVAL; > + > + err = kstrtobool(buf, &enable); > + if (err) > + return err; > + > + err = memcg_set_ksm_for_tasks(memcg, enable); > + if (err) > + return err; > + > + return nbytes; > +} > +#endif /* CONFIG_KSM */ > + > static int memory_stat_show(struct seq_file *m, void *v); > > static struct cftype mem_cgroup_legacy_files[] = { > @@ -5138,6 +5236,14 @@ static struct cftype mem_cgroup_legacy_files[] = { > .write = mem_cgroup_reset, > .read_u64 = mem_cgroup_read_u64, > }, > +#ifdef CONFIG_KSM > + { > + .name = "ksm", > + .flags = CFTYPE_NOT_ON_ROOT, > + .write = memory_ksm_write, > + .seq_show = memory_ksm_show, > + }, > +#endif > { }, /* terminate */ > }; >

2 1

[PATCH OLK-6.6] memcg: support ksm merge any mode per cgroup
by Nanyong Sun 14 Dec '23

14 Dec '23

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8OIQR ---------------------------------------------------------------------- Add control file "memory.ksm" to enable ksm per cgroup. Echo to 1 will set all tasks currently in the cgroup to ksm merge any mode, which means ksm gets enabled for all vma's of a process. Meanwhile echo to 0 will disable ksm for them and unmerge the merged pages. Cat the file will show the above state and ksm related profits of this cgroup. Signed-off-by: Nanyong Sun <sunnanyong(a)huawei.com> --- .../admin-guide/cgroup-v1/memory.rst | 1 + mm/memcontrol.c | 110 +++++++++++++++++- 2 files changed, 109 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index ff456871bf4b..3fdb48435e8e 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -109,6 +109,7 @@ Brief summary of control files. memory.kmem.tcp.failcnt show the number of tcp buf memory usage hits limits memory.kmem.tcp.max_usage_in_bytes show max tcp buf memory usage recorded + memory.ksm set/show ksm merge any mode ==================================== ========================================== 1. History diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8a881ab21f6c..be37c2dda785 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -73,6 +73,7 @@ #include <linux/uaccess.h> #include <trace/events/vmscan.h> +#include <linux/ksm.h> struct cgroup_subsys memory_cgrp_subsys __read_mostly; EXPORT_SYMBOL(memory_cgrp_subsys); @@ -230,10 +231,15 @@ enum res_type { iter != NULL; \ iter = mem_cgroup_iter(NULL, iter, NULL)) +static inline bool __task_is_dying(struct task_struct *task) +{ + return tsk_is_oom_victim(task) || fatal_signal_pending(task) || + (task->flags & PF_EXITING); +} + static inline bool task_is_dying(void) { - return tsk_is_oom_victim(current) || fatal_signal_pending(current) || - (current->flags & PF_EXITING); + return __task_is_dying(current); } /* Some nice accessors for the vmpressure. */ @@ -5010,6 +5016,98 @@ static int mem_cgroup_slab_show(struct seq_file *m, void *p) } #endif +#ifdef CONFIG_KSM +static int memcg_set_ksm_for_tasks(struct mem_cgroup *memcg, bool enable) +{ + struct task_struct *task; + struct mm_struct *mm; + struct css_task_iter it; + int ret = 0; + + css_task_iter_start(&memcg->css, CSS_TASK_ITER_PROCS, &it); + while (!ret && (task = css_task_iter_next(&it))) { + if (__task_is_dying(task)) + continue; + + mm = get_task_mm(task); + if (!mm) + continue; + + if (mmap_write_lock_killable(mm)) { + mmput(mm); + continue; + } + + if (enable) + ret = ksm_enable_merge_any(mm); + else + ret = ksm_disable_merge_any(mm); + + mmap_write_unlock(mm); + mmput(mm); + } + css_task_iter_end(&it); + + return ret; +} + +static int memory_ksm_show(struct seq_file *m, void *v) +{ + unsigned long ksm_merging_pages = 0; + unsigned long ksm_rmap_items = 0; + long ksm_process_profits = 0; + unsigned int tasks = 0; + struct task_struct *task; + struct mm_struct *mm; + struct css_task_iter it; + struct mem_cgroup *memcg = mem_cgroup_from_seq(m); + + css_task_iter_start(&memcg->css, CSS_TASK_ITER_PROCS, &it); + while ((task = css_task_iter_next(&it))) { + mm = get_task_mm(task); + if (!mm) + continue; + + if (test_bit(MMF_VM_MERGE_ANY, &mm->flags)) + tasks++; + + ksm_rmap_items += mm->ksm_rmap_items; + ksm_merging_pages += mm->ksm_merging_pages; + ksm_process_profits += ksm_process_profit(mm); + mmput(mm); + } + css_task_iter_end(&it); + + seq_printf(m, "merge any tasks: %u\n", tasks); + seq_printf(m, "ksm_rmap_items %lu\n", ksm_rmap_items); + seq_printf(m, "ksm_merging_pages %lu\n", ksm_merging_pages); + seq_printf(m, "ksm_process_profits %ld\n", ksm_process_profits); + return 0; +} + +static ssize_t memory_ksm_write(struct kernfs_open_file *of, char *buf, + size_t nbytes, loff_t off) +{ + bool enable; + int err; + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + + buf = strstrip(buf); + if (!buf) + return -EINVAL; + + err = kstrtobool(buf, &enable); + if (err) + return err; + + err = memcg_set_ksm_for_tasks(memcg, enable); + if (err) + return err; + + return nbytes; +} +#endif /* CONFIG_KSM */ + static int memory_stat_show(struct seq_file *m, void *v); static struct cftype mem_cgroup_legacy_files[] = { @@ -5138,6 +5236,14 @@ static struct cftype mem_cgroup_legacy_files[] = { .write = mem_cgroup_reset, .read_u64 = mem_cgroup_read_u64, }, +#ifdef CONFIG_KSM + { + .name = "ksm", + .flags = CFTYPE_NOT_ON_ROOT, + .write = memory_ksm_write, + .seq_show = memory_ksm_show, + }, +#endif { }, /* terminate */ }; -- 2.25.1

2 1

[PATCH OLK-6.6 v2] arm64/ascend: Add new enable_oom_killer interface for oom contrl
by Yuan Can 14 Dec '23

14 Dec '23

From: Weilong Chen <chenweilong(a)huawei.com> ascend inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8NC0E CVE: NA ------------------------------------------------- Support disable oom-killer, and report oom events to bbox vm.enable_oom_killer: 0: disable oom killer 1: enable oom killer (default,compatible with mainline) Signed-off-by: Weilong Chen <chenweilong(a)huawei.com> --- include/linux/oom.h | 24 +++++++++++++++++++ mm/Kconfig | 10 ++++++++ mm/memcontrol.c | 20 ++++++++++++++++ mm/oom_kill.c | 57 +++++++++++++++++++++++++++++++++++++++++++++ mm/util.c | 2 ++ 5 files changed, 113 insertions(+) diff --git a/include/linux/oom.h b/include/linux/oom.h index 7d0c9c48a0c5..b9210e272651 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -112,4 +112,28 @@ extern void oom_killer_enable(void); extern struct task_struct *find_lock_task_mm(struct task_struct *p); +#define OOM_TYPE_NOMEM 0 +#define OOM_TYPE_OVERCOMMIT 1 +#define OOM_TYPE_CGROUP 2 + +#ifdef CONFIG_ASCEND_OOM +int register_hisi_oom_notifier(struct notifier_block *nb); +int unregister_hisi_oom_notifier(struct notifier_block *nb); +int oom_type_notifier_call(unsigned int type, struct oom_control *oc); +#else +static inline int register_hisi_oom_notifier(struct notifier_block *nb) +{ + return -EINVAL; +} + +static inline int unregister_hisi_oom_notifier(struct notifier_block *nb) +{ + return -EINVAL; +} + +static inline int oom_type_notifier_call(unsigned int type, struct oom_control *oc) +{ + return -EINVAL; +} +#endif #endif /* _INCLUDE_LINUX_OOM_H */ diff --git a/mm/Kconfig b/mm/Kconfig index 0f68e5bbeb89..48f4aeeaeff9 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1302,6 +1302,16 @@ config SHARE_POOL in kernel and user level, which is only enabled for ascend platform. To enable this feature, enable_ascend_share_pool bootarg is needed. +config ASCEND_OOM + bool "Enable support for disable oom killer" + default n + help + In some cases we hopes that the oom will not kill the process when it occurs, + be able to notify the black box to report the event, and be able to trigger + the panic to locate the problem. + vm.enable_oom_killer: + 0: disable oom killer + 1: enable oom killer (default,compatible with mainline) source "mm/damon/Kconfig" diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 8a881ab21f6c..fec6f37e61da 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1955,6 +1955,7 @@ static bool mem_cgroup_oom(struct mem_cgroup *memcg, gfp_t mask, int order) current->memcg_in_oom = memcg; current->memcg_oom_gfp_mask = mask; current->memcg_oom_order = order; + oom_type_notifier_call(OOM_TYPE_CGROUP, NULL); } return false; } @@ -2019,6 +2020,8 @@ bool mem_cgroup_oom_synchronize(bool handle) if (locked) mem_cgroup_oom_notify(memcg); + oom_type_notifier_call(OOM_TYPE_CGROUP, NULL); + schedule(); mem_cgroup_unmark_under_oom(memcg); finish_wait(&memcg_oom_waitq, &owait.wait); @@ -3140,6 +3143,20 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) return ret; } +#ifdef CONFIG_ASCEND_OOM +void hisi_oom_recover(struct obj_cgroup *objcg) +{ + struct mem_cgroup *memcg; + + memcg = get_mem_cgroup_from_objcg(objcg); + if (!mem_cgroup_is_root(memcg)) + memcg_oom_recover(memcg); + css_put(&memcg->css); +} +#else +static inline void hisi_oom_recover(struct obj_cgroup *objcg) { } +#endif + /** * __memcg_kmem_uncharge_page: uncharge a kmem page * @page: page to uncharge @@ -3156,6 +3173,9 @@ void __memcg_kmem_uncharge_page(struct page *page, int order) objcg = __folio_objcg(folio); obj_cgroup_uncharge_pages(objcg, nr_pages); + + hisi_oom_recover(objcg); + folio->memcg_data = 0; obj_cgroup_put(objcg); } diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 44bde56ecd02..601ee56cc7d7 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -55,6 +55,7 @@ static int sysctl_panic_on_oom; static int sysctl_oom_kill_allocating_task; static int sysctl_oom_dump_tasks = 1; +static int sysctl_enable_oom_killer = 1; /* * Serializes oom killer invocations (out_of_memory()) from all contexts to @@ -724,6 +725,17 @@ static struct ctl_table vm_oom_kill_table[] = { .mode = 0644, .proc_handler = proc_dointvec, }, +#ifdef CONFIG_ASCEND_OOM + { + .procname = "enable_oom_killer", + .data = &sysctl_enable_oom_killer, + .maxlen = sizeof(sysctl_enable_oom_killer), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, +#endif {} }; #endif @@ -1073,6 +1085,7 @@ static void check_panic_on_oom(struct oom_control *oc) if (is_sysrq_oom(oc)) return; dump_header(oc, NULL); + oom_type_notifier_call(OOM_TYPE_NOMEM, oc); panic("Out of memory: %s panic_on_oom is enabled\n", sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide"); } @@ -1091,6 +1104,45 @@ int unregister_oom_notifier(struct notifier_block *nb) } EXPORT_SYMBOL_GPL(unregister_oom_notifier); +#ifdef CONFIG_ASCEND_OOM +static BLOCKING_NOTIFIER_HEAD(oom_type_notify_list); + +int register_hisi_oom_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_register(&oom_type_notify_list, nb); +} +EXPORT_SYMBOL_GPL(register_hisi_oom_notifier); + +int unregister_hisi_oom_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_unregister(&oom_type_notify_list, nb); +} +EXPORT_SYMBOL_GPL(unregister_hisi_oom_notifier); + +int oom_type_notifier_call(unsigned int type, struct oom_control *oc) +{ + struct oom_control oc_tmp = { 0 }; + static unsigned long caller_jiffies; + + if (sysctl_enable_oom_killer) + return -EINVAL; + + if (oc) + type = is_memcg_oom(oc) ? OOM_TYPE_CGROUP : OOM_TYPE_NOMEM; + else + oc = &oc_tmp; + + if (printk_timed_ratelimit(&caller_jiffies, 10000)) { + pr_err("OOM_NOTIFIER: oom type %u\n", type); + dump_stack(); + show_mem(); + dump_tasks(oc); + } + + return blocking_notifier_call_chain(&oom_type_notify_list, type, NULL); +} +#endif + /** * out_of_memory - kill the "best" process when we run out of memory * @oc: pointer to struct oom_control @@ -1107,6 +1159,11 @@ bool out_of_memory(struct oom_control *oc) if (oom_killer_disabled) return false; + if (!sysctl_enable_oom_killer) { + oom_type_notifier_call(OOM_TYPE_NOMEM, oc); + return false; + } + if (!is_memcg_oom(oc)) { blocking_notifier_call_chain(&oom_notify_list, 0, &freed); if (freed > 0 && !is_sysrq_oom(oc)) diff --git a/mm/util.c b/mm/util.c index 90250cbc82fe..e41ac8a58eb5 100644 --- a/mm/util.c +++ b/mm/util.c @@ -26,6 +26,7 @@ #include <linux/share_pool.h> #include <linux/uaccess.h> +#include <linux/oom.h> #include "internal.h" #include "swap.h" @@ -981,6 +982,7 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin) error: pr_warn_ratelimited("%s: pid: %d, comm: %s, not enough memory for the allocation\n", __func__, current->pid, current->comm); + oom_type_notifier_call(OOM_TYPE_OVERCOMMIT, NULL); vm_unacct_memory(pages); return -ENOMEM; -- 2.17.1

2 1

[PATCH OLK-6.6 0/8] kaslr: ppc64: Introduce KASLR for PPC64
by GUO Zihua 14 Dec '23

14 Dec '23

This patchset introduces KASLR for PowerPC64 chips. GUO Zihua (2): kaslr: ppc64: Provide correct r5 value for relocated kernel powerpc: kaslr: Fix preserved memory size for int-vectors issue Jason Yan (6): powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and kaslr_early_init() powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper powerpc/fsl_booke/64: implement KASLR for fsl_booke64 powerpc/fsl_booke/64: do not clear the BSS for the second pass powerpc/fsl_booke/64: clear the original kernel if randomized powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst and add 64bit part Documentation/powerpc/index.rst | 2 +- .../{kaslr-booke32.rst => kaslr-booke.rst} | 35 ++++++- arch/powerpc/Kconfig | 5 +- arch/powerpc/kernel/exceptions-64e.S | 27 ++++++ arch/powerpc/kernel/head_64.S | 22 +++++ arch/powerpc/kernel/prom.c | 8 +- arch/powerpc/kernel/setup_64.c | 3 + arch/powerpc/mm/mmu_decl.h | 25 ++--- arch/powerpc/mm/nohash/kaslr_booke.c | 91 +++++++++++++------ 9 files changed, 169 insertions(+), 49 deletions(-) rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%) -- 2.34.1

2 9

[PATCH OLK-6.6] serial: amba-pl011: Fix serial port discard interrupt when interrupt signal line of serial port is connected to mbigen.
by Yuan Can 14 Dec '23

14 Dec '23

ascend inclusion category: bugfix Bugzilla: https://gitee.com/openeuler/kernel/issues/I8NC0E CVE: N/A --------------------------------------- Hisi when designing ascend chip, connect the serial port interrupt signal lines to mbigen equipment, mbigen write GICD_SETSPI_NSR register trigger the SPI interrupt. This can result in serial port drop interrupts. Signed-off-by: Xu Qiang <xuqiang36(a)huawei.com> Signed-off-by: Yuan Can <yuancan(a)huawei.com> --- drivers/tty/serial/Kconfig | 15 +++++++++++++++ drivers/tty/serial/amba-pl011.c | 17 +++++++++++++++++ 2 files changed, 32 insertions(+) diff --git a/drivers/tty/serial/Kconfig b/drivers/tty/serial/Kconfig index bdc568a4ab66..a2577253f774 100644 --- a/drivers/tty/serial/Kconfig +++ b/drivers/tty/serial/Kconfig @@ -73,6 +73,21 @@ config SERIAL_AMBA_PL011_CONSOLE your boot loader (lilo or loadlin) about how to pass options to the kernel at boot time.) +config SERIAL_ATTACHED_MBIGEN + bool "Serial port interrupt signal lines connected to the mbigen" + depends on SERIAL_AMBA_PL011=y + depends on ASCEND_FEATURES + default n + help + Say Y here when the interrupt signal line of the serial port is + connected to the mbigne. The mbigen device has the function of + clearing interrupts automatically. However, the interrupt processing + function of the serial port driver may process multiple interrupts + at a time. The mbigen device cannot adapt to this scenario. + As a result, interrupts are lost.Because it maybe discard interrupt. + + If unsure, say N. + config SERIAL_EARLYCON_SEMIHOST bool "Early console using Arm compatible semihosting" depends on ARM64 || ARM || RISCV diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c index 3dc9b0fcab1c..713e3ef8738f 100644 --- a/drivers/tty/serial/amba-pl011.c +++ b/drivers/tty/serial/amba-pl011.c @@ -1548,6 +1548,18 @@ static void check_apply_cts_event_workaround(struct uart_amba_port *uap) pl011_read(uap, REG_ICR); } +#ifdef CONFIG_SERIAL_ATTACHED_MBIGEN +static bool pl011_enable_hisi_wkrd; +static int __init pl011_check_hisi_workaround_setup(char *str) +{ + pl011_enable_hisi_wkrd = 1; + return 0; +} +__setup("pl011_hisi_wkrd", pl011_check_hisi_workaround_setup); +#else +#define pl011_enable_hisi_wkrd 0 +#endif + static irqreturn_t pl011_int(int irq, void *dev_id) { struct uart_amba_port *uap = dev_id; @@ -1585,6 +1597,11 @@ static irqreturn_t pl011_int(int irq, void *dev_id) handled = 1; } + if (pl011_enable_hisi_wkrd) { + pl011_write(0, uap, REG_IMSC); + pl011_write(uap->im, uap, REG_IMSC); + } + spin_unlock_irqrestore(&uap->port.lock, flags); return IRQ_RETVAL(handled); -- 2.17.1

3 2

[PATCH V2 OLK-6.6 0/6] Add support for hisi HBM devices
by Zhang Zekun 14 Dec '23

14 Dec '23

Add support for HISI HBM device, and HBM ACLS repair. The patch set includes three functionalities. 1. Add support for hbm device hotplug. 2. provide extra information with hbm locality information 3. Add support for HBM acls repair. Zhang Zekun (6): ACPI: OSL: Export the symbol of acpi_hotplug_schedule soc: hisilicon: hisi_hbmdev: Add power domain control methods ACPI: memhotplug: export the state of each hotplug device soc: hisilicon: hisi_hbmdev: Provide extra memory topology information soc: hbmcache: Add support for online and offline the hbm cache soc: hisilicon: hisi_hbmdev: Add hbm acls repair and query methods drivers/acpi/acpi_memhotplug.c | 30 ++ drivers/acpi/internal.h | 1 - drivers/acpi/osl.c | 1 + drivers/base/container.c | 3 + drivers/soc/hisilicon/Kconfig | 36 +++ drivers/soc/hisilicon/Makefile | 3 + drivers/soc/hisilicon/hisi_hbmcache.c | 147 +++++++++ drivers/soc/hisilicon/hisi_hbmdev.c | 435 ++++++++++++++++++++++++++ drivers/soc/hisilicon/hisi_internal.h | 31 ++ include/linux/acpi.h | 1 + include/linux/memory_hotplug.h | 4 + 11 files changed, 691 insertions(+), 1 deletion(-) create mode 100644 drivers/soc/hisilicon/hisi_hbmcache.c create mode 100644 drivers/soc/hisilicon/hisi_hbmdev.c create mode 100644 drivers/soc/hisilicon/hisi_internal.h -- 2.17.1

3 9

[PATCH OLK-6.6] fs/dirty_pages: dump the number of dirty pages for each inode
by Zizhi Wo 14 Dec '23

14 Dec '23

hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8OHM4?from=project-issue --------------------------- In order to analyse the IO performance when using buffer IO, it's useful to obtain the number of dirty pages for each inode in the filesystem. This feature depends on 'CONFIG_DIRTY_PAGES'. It creates 2 interfaces by using profs. /proc/dirty/page_threshold to filter result and /proc/dirty/dirty_list to get dirty pages. Signed-off-by: Zizhi Wo <wozizhi(a)huawei.com> --- arch/arm64/configs/openeuler_defconfig | 1 + arch/x86/configs/openeuler_defconfig | 1 + fs/Kconfig | 13 ++ fs/Makefile | 1 + fs/dirty_pages.c | 275 +++++++++++++++++++++++++ 5 files changed, 291 insertions(+) create mode 100644 fs/dirty_pages.c diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 3781d138c3e3..ce101e7d8fcc 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -6881,6 +6881,7 @@ CONFIG_PROC_PAGE_MONITOR=y CONFIG_PROC_CHILDREN=y CONFIG_KERNFS=y CONFIG_SYSFS=y +CONFIG_DIRTY_PAGES=y CONFIG_TMPFS=y CONFIG_TMPFS_POSIX_ACL=y CONFIG_TMPFS_XATTR=y diff --git a/arch/x86/configs/openeuler_defconfig b/arch/x86/configs/openeuler_defconfig index e19bf53c0bd9..76b87ed62579 100644 --- a/arch/x86/configs/openeuler_defconfig +++ b/arch/x86/configs/openeuler_defconfig @@ -8088,6 +8088,7 @@ CONFIG_PROC_PID_ARCH_STATUS=y CONFIG_PROC_CPU_RESCTRL=y CONFIG_KERNFS=y CONFIG_SYSFS=y +CONFIG_DIRTY_PAGES=y CONFIG_TMPFS=y CONFIG_TMPFS_POSIX_ACL=y CONFIG_TMPFS_XATTR=y diff --git a/fs/Kconfig b/fs/Kconfig index aa7e03cc1941..a2280cf98729 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -170,6 +170,19 @@ source "fs/proc/Kconfig" source "fs/kernfs/Kconfig" source "fs/sysfs/Kconfig" +config DIRTY_PAGES + bool "Dumps the number of dirty pages of each file" + depends on PROC_FS + default y + help + This config supports the rendering of dirty page data to the user, + which may be useful to analyze the IO performance when using buffer + IO. + + It create 3 interfaces by using procfs. /proc/dirty/buffer_size for + buffer allocation and release; /proc/dirty/page_threshold to filter + result; /proc/dirty/dirty_list to get dirty pages. + config TMPFS bool "Tmpfs virtual memory file system support (former shm fs)" depends on SHMEM diff --git a/fs/Makefile b/fs/Makefile index f9541f40be4e..6246e173e1e4 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -44,6 +44,7 @@ obj-$(CONFIG_FS_POSIX_ACL) += posix_acl.o obj-$(CONFIG_NFS_COMMON) += nfs_common/ obj-$(CONFIG_COREDUMP) += coredump.o obj-$(CONFIG_SYSCTL) += drop_caches.o sysctls.o +obj-$(CONFIG_DIRTY_PAGES) += dirty_pages.o obj-$(CONFIG_FHANDLE) += fhandle.o obj-y += iomap/ diff --git a/fs/dirty_pages.c b/fs/dirty_pages.c new file mode 100644 index 000000000000..4edcba2bf1d0 --- /dev/null +++ b/fs/dirty_pages.c @@ -0,0 +1,275 @@ +#include <linux/proc_fs.h> +#include <linux/seq_file.h> +#include <linux/uaccess.h> +#include <linux/pagemap.h> +#include <linux/pagevec.h> +#include <linux/fs.h> +#include <linux/mm.h> +#include <linux/slab.h> +#include <linux/sched.h> +#include <linux/proc_fs.h> +#include <linux/kdev_t.h> +#include <linux/vmalloc.h> +#include <linux/module.h> +#include <linux/kernel.h> +#include <linux/init.h> +#include "internal.h" + +static int buff_limit; /* filter threshold of dirty pages*/ + +static struct proc_dir_entry *dirty_dir; + +/* proc root directory */ +#define DIRTY_ROOT "dirty" +/* proc file to obtain diry pages of each inode */ +#define DIRTY_PAGES "dirty_list" +/* proc file to filter result */ +#define DIRTY_LIMIT "page_threshold" + +static void seq_set_overflow(struct seq_file *m) +{ + m->count = m->size; +} + +static unsigned long dump_dirtypages_inode(struct inode *inode) +{ + struct folio_batch fbatch; + unsigned long nr_dirtys = 0; + unsigned int nr_folios; + pgoff_t index = 0; + + folio_batch_init(&fbatch); + + while (1) { + nr_folios = filemap_get_folios_tag(inode->i_mapping, &index, + (pgoff_t)-1, PAGECACHE_TAG_DIRTY, &fbatch); + if (!nr_folios) + break; + + folio_batch_release(&fbatch); + cond_resched(); + + nr_dirtys += nr_folios; + } + + return nr_dirtys; +} + +static char *inode_filename(struct inode *inode, char *tmpname) +{ + struct dentry *dentry; + char *filename; + + dentry = d_find_alias(inode); + if (!dentry) + return ERR_PTR(-ENOENT); + + tmpname[PATH_MAX-1] = '\0'; + filename = dentry_path_raw(dentry, tmpname, PATH_MAX); + + dput(dentry); + + return filename; +} + +static inline bool is_sb_writable(struct super_block *sb) +{ + if (sb_rdonly(sb)) + return false; + + if (sb->s_writers.frozen == SB_FREEZE_COMPLETE) + return false; + + return true; +} + +/* + * dump_dirtypages_sb - dump the dirty pages of each inode in the sb + * @sb the super block + * @m the seq_file witch is initialized in proc_dpages_open + * + * For each inode in the sb, call dump_dirtypages_pages to get the number + * of dirty pages. And use seq_printf to store the result in the buffer + * if it's not less than the threshold. The inode in unusual state will + * be skipped. + */ +static void dump_dirtypages_sb(struct super_block *sb, struct seq_file *m) +{ + struct inode *inode, *toput_inode = NULL; + unsigned long nr_dirtys; + const char *fstype; + char *filename; + char *tmpname; + int limit = READ_ONCE(buff_limit); + + if (!is_sb_writable(sb)) + return; + + tmpname = kmalloc(PATH_MAX, GFP_KERNEL); + if (!tmpname) + return; + + spin_lock(&sb->s_inode_list_lock); + list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { + spin_lock(&inode->i_lock); + + /* + * We must skip inodes in unusual state. We may also skip + * inodes without pages but we deliberately won't in case + * we need to reschedule to avoid softlockups. + */ + if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) || + (inode->i_mapping->nrpages == 0 && !need_resched())) { + spin_unlock(&inode->i_lock); + continue; + } + __iget(inode); + spin_unlock(&inode->i_lock); + spin_unlock(&sb->s_inode_list_lock); + + cond_resched(); + + nr_dirtys = dump_dirtypages_inode(inode); + if (!nr_dirtys || nr_dirtys < limit) + goto skip; + + filename = inode_filename(inode, tmpname); + if (IS_ERR_OR_NULL(filename)) + filename = "unknown"; + + if (sb->s_type && sb->s_type->name) + fstype = sb->s_type->name; + else + fstype = "unknown"; + /* + * seq_printf return nothing, if the buffer is exhausted + * (m->size <= m->count), seq_printf will not store + * anything, just set m->count = m->size and return. In + * that case, log a warn message in buffer to remind users. + */ + if (m->size <= m->count) { + seq_set_overflow(m); + strncpy(m->buf+m->count-12, "terminated\n\0", 12); + iput(inode); + goto done; + } + seq_printf(m, "FSType: %s, Dev ID: %u(%u:%u) ino %lu, dirty pages %lu, path %s\n", + fstype, sb->s_dev, MAJOR(sb->s_dev), + MINOR(sb->s_dev), inode->i_ino, + nr_dirtys, filename); +skip: + iput(toput_inode); + toput_inode = inode; + spin_lock(&sb->s_inode_list_lock); + } + spin_unlock(&sb->s_inode_list_lock); +done: + iput(toput_inode); + kfree(tmpname); +} + +static int proc_dpages_show(struct seq_file *m, void *v) +{ + iterate_supers((void *)dump_dirtypages_sb, (void *)m); + return 0; +} + +static int proc_dpages_open(struct inode *inode, struct file *filp) +{ + return single_open(filp, proc_dpages_show, NULL); +} + +static const struct proc_ops proc_dpages_operations = { + .proc_open = proc_dpages_open, + .proc_read = seq_read, + .proc_release = single_release, +}; + +static int proc_limit_show(struct seq_file *m, void *v) +{ + seq_printf(m, "%d\n", READ_ONCE(buff_limit)); + return 0; +} + +static int proc_limit_open(struct inode *inode, struct file *filp) +{ + return single_open(filp, proc_limit_show, NULL); +} + +static ssize_t write_limit_proc( + struct file *filp, + const char *buf, + size_t count, + loff_t *offp) +{ + char *msg; + int ret = 0; + long temp; + + if (count > PAGE_SIZE) { + ret = -EINVAL; + goto error; + } + + msg = kzalloc(PAGE_SIZE, GFP_KERNEL); + if (!msg) { + ret = -ENOMEM; + goto error; + } + + if (copy_from_user(msg, buf, count)) { + ret = -EINVAL; + goto free; + } + ret = kstrtol(msg, 10, &temp); + if (ret != 0 || temp < 0 || temp > INT_MAX) { + ret = -EINVAL; + goto free; + } + + WRITE_ONCE(buff_limit, temp); + ret = count; + +free: + kfree(msg); +error: + return ret; +} + +static const struct proc_ops proc_limit_operations = { + .proc_open = proc_limit_open, + .proc_read = seq_read, + .proc_write = write_limit_proc, + .proc_lseek = seq_lseek, + .proc_release = single_release, +}; + +static int __init dpages_proc_init(void) +{ + static struct proc_dir_entry *proc_file; + + dirty_dir = proc_mkdir(DIRTY_ROOT, NULL); + if (!dirty_dir) + goto fail_dir; + + proc_file = proc_create(DIRTY_PAGES, 0440, + dirty_dir, &proc_dpages_operations); + if (!proc_file) + goto fail_pages; + + proc_file = proc_create(DIRTY_LIMIT, 0640, + dirty_dir, &proc_limit_operations); + if (!proc_file) + goto fail_limit; + + return 0; + +fail_limit: + remove_proc_entry(DIRTY_PAGES, dirty_dir); +fail_pages: + remove_proc_entry(DIRTY_ROOT, NULL); +fail_dir: + return -ENOMEM; +} + +subsys_initcall(dpages_proc_init); -- 2.39.2

2 1

[PATCH OLK-6.6 0/2] sync pin memory and pid reserve's code from
by Liu Chao 14 Dec '23

14 Dec '23

euleros inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8NXQM ------------------------------ Liu Chao (2): mm: add pin memory method for checkpoint add restore pid: add pid reserve method for checkpoint and recover arch/arm64/configs/openeuler_defconfig | 3 + arch/arm64/kernel/setup.c | 2 + arch/arm64/mm/init.c | 6 + drivers/char/Kconfig | 7 + drivers/char/Makefile | 1 + drivers/char/pin_memory.c | 209 +++++ fs/proc/task_mmu.c | 136 +++ include/linux/page-flags.h | 9 + include/linux/pin_mem.h | 117 +++ include/trace/events/mmflags.h | 9 +- kernel/pid.c | 4 + mm/Kconfig | 19 + mm/Makefile | 1 + mm/huge_memory.c | 64 ++ mm/memory.c | 59 ++ mm/pin_mem.c | 1194 ++++++++++++++++++++++++ mm/rmap.c | 3 +- 17 files changed, 1841 insertions(+), 2 deletions(-) create mode 100644 drivers/char/pin_memory.c create mode 100644 include/linux/pin_mem.h create mode 100644 mm/pin_mem.c -- 2.33.0

2 3

[PATCH OLK-6.6] ACPI / APEI: Notify all ras err to driver
by Yuan Can 14 Dec '23

14 Dec '23

From: Weilong Chen <chenweilong(a)huawei.com> ascend inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8NC0E CVE: NA ------------------------------------------------- Customization deliver all types error to driver. As the driver need to process the errors in process context. Signed-off-by: Weilong Chen <chenweilong(a)huawei.com> --- arch/arm64/configs/openeuler_defconfig | 1 + drivers/acpi/apei/Kconfig | 7 +++++++ drivers/acpi/apei/ghes.c | 8 +++++++- 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/arch/arm64/configs/openeuler_defconfig b/arch/arm64/configs/openeuler_defconfig index 3781d138c3e3..5775069d6844 100644 --- a/arch/arm64/configs/openeuler_defconfig +++ b/arch/arm64/configs/openeuler_defconfig @@ -653,6 +653,7 @@ CONFIG_ACPI_HMAT=y CONFIG_HAVE_ACPI_APEI=y CONFIG_ACPI_APEI=y CONFIG_ACPI_APEI_GHES=y +CONFIG_ACPI_APEI_GHES_NOTIFY_ALL_RAS_ERR=y CONFIG_ACPI_APEI_PCIEAER=y CONFIG_ACPI_APEI_SEA=y CONFIG_ACPI_APEI_MEMORY_FAILURE=y diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig index 6b18f8bc7be3..1dce3ad7c9bd 100644 --- a/drivers/acpi/apei/Kconfig +++ b/drivers/acpi/apei/Kconfig @@ -33,6 +33,13 @@ config ACPI_APEI_GHES by firmware to produce more valuable hardware error information for Linux. +config ACPI_APEI_GHES_NOTIFY_ALL_RAS_ERR + bool "Notify all ras err to driver" + depends on ARM64 && ACPI_APEI_GHES + default n + help + Deliver all types of error to driver. + config ACPI_APEI_PCIEAER bool "APEI PCIe AER logging/recovering support" depends on ACPI_APEI && PCIEAER diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 63ad0541db38..bf1b9252a8da 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -692,12 +692,18 @@ static bool ghes_do_proc(struct ghes *ghes, queued = ghes_handle_arm_hw_error(gdata, sev); } else { void *err = acpi_hest_get_payload(gdata); - +#ifndef CONFIG_ACPI_APEI_GHES_NOTIFY_ALL_RAS_ERR ghes_defer_non_standard_event(gdata, sev); +#endif log_non_standard_event(sec_type, fru_id, fru_text, sec_sev, err, gdata->error_data_length); } + +#ifdef CONFIG_ACPI_APEI_GHES_NOTIFY_ALL_RAS_ERR + /* Customization deliver all types error to driver. */ + ghes_defer_non_standard_event(gdata, sev); +#endif } return queued; -- 2.17.1

2 1