From: Hui Tang tanghui20@huawei.com
hulk inclusion category: bugfix bugzilla: 187419, https://gitee.com/openeuler/kernel/issues/I612FD
-------------------------------
do_el0_svc+0x50/0x11c arch/arm64/kernel/syscall.c:217 el0_svc+0x20/0x30 arch/arm64/kernel/entry-common.c:353 el0_sync_handler+0xe4/0x1e0 arch/arm64/kernel/entry-common.c:369 el0_sync+0x148/0x180 arch/arm64/kernel/entry.S:683
================================================================== BUG: KASAN: null-ptr-deref in rq_of kernel/sched/sched.h:1118 [inline] BUG: KASAN: null-ptr-deref in unthrottle_qos_sched_group kernel/sched/fair.c:7619 [inline] BUG: KASAN: null-ptr-deref in free_fair_sched_group+0x124/0x320 kernel/sched/fair.c:12131 Read of size 8 at addr 0000000000000130 by task syz-executor100/223
CPU: 3 PID: 223 Comm: syz-executor100 Not tainted 5.10.0 #6 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x40c arch/arm64/kernel/stacktrace.c:132 show_stack+0x30/0x40 arch/arm64/kernel/stacktrace.c:196 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b4/0x248 lib/dump_stack.c:118 __kasan_report mm/kasan/report.c:551 [inline] kasan_report+0x18c/0x210 mm/kasan/report.c:564 check_memory_region_inline mm/kasan/generic.c:187 [inline] __asan_load8+0x98/0xc0 mm/kasan/generic.c:253 rq_of kernel/sched/sched.h:1118 [inline] unthrottle_qos_sched_group kernel/sched/fair.c:7619 [inline] free_fair_sched_group+0x124/0x320 kernel/sched/fair.c:12131 sched_free_group kernel/sched/core.c:7767 [inline] sched_create_group+0x48/0xc0 kernel/sched/core.c:7798 cpu_cgroup_css_alloc+0x18/0x40 kernel/sched/core.c:7930 css_create+0x7c/0x4a0 kernel/cgroup/cgroup.c:5328 cgroup_apply_control_enable+0x288/0x340 kernel/cgroup/cgroup.c:3135 cgroup_apply_control kernel/cgroup/cgroup.c:3217 [inline] cgroup_subtree_control_write+0x668/0x8b0 kernel/cgroup/cgroup.c:3375 cgroup_file_write+0x1a8/0x37c kernel/cgroup/cgroup.c:3909 kernfs_fop_write_iter+0x220/0x2f4 fs/kernfs/file.c:296 call_write_iter include/linux/fs.h:1960 [inline] new_sync_write+0x260/0x370 fs/read_write.c:515 vfs_write+0x3dc/0x4ac fs/read_write.c:602 ksys_write+0xfc/0x200 fs/read_write.c:655 __do_sys_write fs/read_write.c:667 [inline] __se_sys_write fs/read_write.c:664 [inline] __arm64_sys_write+0x50/0x60 fs/read_write.c:664 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common.constprop.0+0xf4/0x414 arch/arm64/kernel/syscall.c:155 do_el0_svc+0x50/0x11c arch/arm64/kernel/syscall.c:217 el0_svc+0x20/0x30 arch/arm64/kernel/entry-common.c:353 el0_sync_handler+0xe4/0x1e0 arch/arm64/kernel/entry-common.c:369 el0_sync+0x148/0x180 arch/arm64/kernel/entry.S:683
So add check for tg->cfs_rq[i] before unthrottle_qos_sched_group() called.
Signed-off-by: Hui Tang tanghui20@huawei.com Signed-off-by: Zheng Zucheng zhengzucheng@huawei.com Reviewed-by: Zhang Qiao zhangqiao22@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7fa2eecac1f4..654e964b5c31 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -12157,7 +12157,7 @@ void free_fair_sched_group(struct task_group *tg)
for_each_possible_cpu(i) { #ifdef CONFIG_QOS_SCHED - if (tg->cfs_rq) + if (tg->cfs_rq && tg->cfs_rq[i]) unthrottle_qos_sched_group(tg->cfs_rq[i]); #endif if (tg->cfs_rq)
From: Zhao Wenhui zhaowenhui8@huawei.com
hulk inclusion category: bugfix bugzilla: 187464, https://gitee.com/openeuler/kernel/issues/I612GU
--------------------------------
sched/fair: limit burst to zero when cfs bandwidth is toggled off
When the quota value in CFS bandwidth is set to -1, that imples the cfs bandwidth is toggled off. So the burst feature is supposed to be disable as well.
Currently, when quota is -1, burst will not be check, so that it can be set to almost arbitery value. Examples: mkdir /sys/fs/cgroup/cpu/test echo -1 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us echo 10000000000000000 > /sys/fs/cgroup/cpu/test/cpu.cfs_burst_us
Moreover, after the burst modified by this way, quota can't be set to any value: echo 100000 > cpu.cfs_quota_us -bash: echo: write error: Invalid argument
This patch can ensure the burst value being zero and unalterable, when quota is set to -1.
Fixes: f4183717b370 ("sched/fair: Introduce the burstable CFS controller") Signed-off-by: Zhao Wenhui zhaowenhui8@huawei.com Signed-off-by: Zheng Zucheng zhengzucheng@huawei.com Reviewed-by: Zhang Qiao zhangqiao22@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/sched/core.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 181453cf3e7a..62d14fba4ca6 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9120,6 +9120,12 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota, burst + quota > max_cfs_runtime)) return -EINVAL;
+ /* + * Ensure burst equals to zero when quota is -1. + */ + if (quota == RUNTIME_INF && burst) + return -EINVAL; + /* * Prevent race between setting of cfs_rq->runtime_enabled and * unthrottle_offline_cfs_rqs(). @@ -9179,8 +9185,10 @@ static int tg_set_cfs_quota(struct task_group *tg, long cfs_quota_us)
period = ktime_to_ns(tg->cfs_bandwidth.period); burst = tg->cfs_bandwidth.burst; - if (cfs_quota_us < 0) + if (cfs_quota_us < 0) { quota = RUNTIME_INF; + burst = 0; + } else if ((u64)cfs_quota_us <= U64_MAX / NSEC_PER_USEC) quota = (u64)cfs_quota_us * NSEC_PER_USEC; else
From: Alistair Popple apopple@nvidia.com
mainline inclusion from mainline-v5.17-rc1 commit ffa65753c43142f3b803486442813744da71cff2 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I614X1 CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
This fixes the FIXME in migrate_vma_check_page().
Before migrating a page migration code will take a reference and check there are no unexpected page references, failing the migration if there are. When a thread faults on a migration entry it will take a temporary reference to the page to wait for the page to become unlocked signifying the migration entry has been removed.
This reference is dropped just prior to waiting on the page lock, however the extra reference can cause migration failures so it is desirable to avoid taking it.
As migration code already has a reference to the migrating page an extra reference to wait on PG_locked is unnecessary so long as the reference can't be dropped whilst setting up the wait.
When faulting on a migration entry the ptl is taken to check the migration entry. Removing a migration entry also requires the ptl, and migration code won't drop its page reference until after the migration entry has been removed. Therefore retaining the ptl of a migration entry is sufficient to ensure the page has a reference. Reworking migration_entry_wait() to hold the ptl until the wait setup is complete means the extra page reference is no longer needed.
[apopple@nvidia.com: v5] Link: https://lkml.kernel.org/r/20211213033848.1973946-1-apopple@nvidia.com
Link: https://lkml.kernel.org/r/20211118020754.954425-1-apopple@nvidia.com Signed-off-by: Alistair Popple apopple@nvidia.com Acked-by: David Hildenbrand david@redhat.com Cc: David Howells dhowells@redhat.com Cc: Hugh Dickins hughd@google.com Cc: Jason Gunthorpe jgg@nvidia.com Cc: Jerome Glisse jglisse@redhat.com Cc: John Hubbard jhubbard@nvidia.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Ralph Campbell rcampbell@nvidia.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Conflicts: include/linux/migrate.h.rej mm/filemap.c.rej mm/migrate.c.rej
Signed-off-by: Chen Wandun chenwandun@huawei.com Reviewed-by: Nanyong Sun sunnanyong@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/migrate.h | 2 + mm/filemap.c | 91 +++++++++++++++++++++++++++++++++++++++++ mm/migrate.c | 39 ++---------------- 3 files changed, 97 insertions(+), 35 deletions(-)
diff --git a/include/linux/migrate.h b/include/linux/migrate.h index a9de6d3ae07d..ade4993f5fab 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -56,6 +56,8 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping, struct page *newpage, struct page *page); extern int migrate_page_move_mapping(struct address_space *mapping, struct page *newpage, struct page *page, int extra_count); +void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep, + spinlock_t *ptl); #else
static inline void putback_movable_pages(struct list_head *l) {} diff --git a/mm/filemap.c b/mm/filemap.c index 1b502f2e5253..f8e1ee68ecac 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -21,6 +21,7 @@ #include <linux/gfp.h> #include <linux/mm.h> #include <linux/swap.h> +#include <linux/swapops.h> #include <linux/mman.h> #include <linux/pagemap.h> #include <linux/file.h> @@ -42,6 +43,7 @@ #include <linux/psi.h> #include <linux/ramfs.h> #include <linux/page_idle.h> +#include <linux/migrate.h> #include "internal.h"
#define CREATE_TRACE_POINTS @@ -1323,6 +1325,95 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q, return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR; }
+#ifdef CONFIG_MIGRATION +/** + * migration_entry_wait_on_locked - Wait for a migration entry to be removed + * @entry: migration swap entry. + * @ptep: mapped pte pointer. Will return with the ptep unmapped. Only required + * for pte entries, pass NULL for pmd entries. + * @ptl: already locked ptl. This function will drop the lock. + * + * Wait for a migration entry referencing the given page to be removed. This is + * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except + * this can be called without taking a reference on the page. Instead this + * should be called while holding the ptl for the migration entry referencing + * the page. + * + * Returns after unmapping and unlocking the pte/ptl with pte_unmap_unlock(). + * + * This follows the same logic as wait_on_page_bit_common() so see the comments + * there. + */ +void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep, + spinlock_t *ptl) +{ + struct wait_page_queue wait_page; + wait_queue_entry_t *wait = &wait_page.wait; + bool thrashing = false; + bool delayacct = false; + unsigned long pflags; + wait_queue_head_t *q; + struct page *page = compound_head(migration_entry_to_page(entry)); + + q = page_waitqueue(page); + if (!PageUptodate(page) && PageWorkingset(page)) { + if (!PageSwapBacked(page)) { + delayacct_thrashing_start(); + delayacct = true; + } + psi_memstall_enter(&pflags); + thrashing = true; + } + + init_wait(wait); + wait->func = wake_page_function; + wait_page.page = page; + wait_page.bit_nr = PG_locked; + wait->flags = 0; + + spin_lock_irq(&q->lock); + SetPageWaiters(page); + if (!trylock_page_bit_common(page, PG_locked, wait)) + __add_wait_queue_entry_tail(q, wait); + spin_unlock_irq(&q->lock); + + /* + * If a migration entry exists for the page the migration path must hold + * a valid reference to the page, and it must take the ptl to remove the + * migration entry. So the page is valid until the ptl is dropped. + */ + if (ptep) + pte_unmap_unlock(ptep, ptl); + else + spin_unlock(ptl); + + for (;;) { + unsigned int flags; + + set_current_state(TASK_UNINTERRUPTIBLE); + + /* Loop until we've been woken or interrupted */ + flags = smp_load_acquire(&wait->flags); + if (!(flags & WQ_FLAG_WOKEN)) { + if (signal_pending_state(TASK_UNINTERRUPTIBLE, current)) + break; + + io_schedule(); + continue; + } + break; + } + + finish_wait(q, wait); + + if (thrashing) { + if (delayacct) + delayacct_thrashing_end(); + psi_memstall_leave(&pflags); + } +} +#endif + void wait_on_page_bit(struct page *page, int bit_nr) { wait_queue_head_t *q = page_waitqueue(page); diff --git a/mm/migrate.c b/mm/migrate.c index 3f6c76b97989..b3c47cc9b622 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -315,7 +315,6 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, { pte_t pte; swp_entry_t entry; - struct page *page;
spin_lock(ptl); pte = *ptep; @@ -326,18 +325,7 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, if (!is_migration_entry(entry)) goto out;
- page = migration_entry_to_page(entry); - page = compound_head(page); - - /* - * Once page cache replacement of page migration started, page_count - * is zero; but we must not call put_and_wait_on_page_locked() without - * a ref. Use get_page_unless_zero(), and just fault again if it fails. - */ - if (!get_page_unless_zero(page)) - goto out; - pte_unmap_unlock(ptep, ptl); - put_and_wait_on_page_locked(page); + migration_entry_wait_on_locked(entry, ptep, ptl); return; out: pte_unmap_unlock(ptep, ptl); @@ -362,16 +350,11 @@ void migration_entry_wait_huge(struct vm_area_struct *vma, void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd) { spinlock_t *ptl; - struct page *page;
ptl = pmd_lock(mm, pmd); if (!is_pmd_migration_entry(*pmd)) goto unlock; - page = migration_entry_to_page(pmd_to_swp_entry(*pmd)); - if (!get_page_unless_zero(page)) - goto unlock; - spin_unlock(ptl); - put_and_wait_on_page_locked(page); + migration_entry_wait_on_locked(pmd_to_swp_entry(*pmd), NULL, ptl); return; unlock: spin_unlock(ptl); @@ -2558,22 +2541,8 @@ static bool migrate_vma_check_page(struct page *page, struct page *fault_page) return false;
/* Page from ZONE_DEVICE have one extra reference */ - if (is_zone_device_page(page)) { - /* - * Private page can never be pin as they have no valid pte and - * GUP will fail for those. Yet if there is a pending migration - * a thread might try to wait on the pte migration entry and - * will bump the page reference count. Sadly there is no way to - * differentiate a regular pin from migration wait. Hence to - * avoid 2 racing thread trying to migrate back to CPU to enter - * infinite loop (one stopping migration because the other is - * waiting on pte migration entry). We always return true here. - * - * FIXME proper solution is to rework migration_entry_wait() so - * it does not need to take a reference on page. - */ - return is_device_private_page(page); - } + if (is_zone_device_page(page)) + extra++;
/* For file back page */ if (page_mapping(page))
From: Wang ShaoBo bobo.shaobowang@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61CPK CVE: NA
--------------------------------
Writing bandwidth control value to /sys/fs/resctrl/schemata may return an incorrect value.
For instance:
$ cat /sys/fs/resctrl/schemata L3:0=7fff;1=7fff;2=7fff;3=7fff MB:0=100;1=100;2=100;3=100 $ echo 'MB:0=20' > /sys/fs/resctrl/schemata $ cat /sys/fs/resctrl/schemata L3:0=7fff;1=7fff;2=7fff;3=7fff MB:0=18;1=100;2=100;3=100
Assuming that bwa_wd(given in MPAMF_MBW_IDR.BWA_WD) is greater than or equal to 6 (fraction is 64), we can divide mbw_max/min with a granularity of at least 2 in the range between 0 to 100, if we calculate mbw_max/min with this formula1: mbw_max/min = (Input * range) / Scale
Then we turn to ask for Input with this formula2: Input = (mbw_max/min * Scale) / range
But if calculated deviation is too large in formula1, we can no longer use formula2 to get our original value at a granularity of 2. Therefore, we need to correct the calculation result of formula1.
Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/kernel/mpam/mpam_resctrl.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/arch/arm64/kernel/mpam/mpam_resctrl.c b/arch/arm64/kernel/mpam/mpam_resctrl.c index e9e77064bdb2..d00746c08922 100644 --- a/arch/arm64/kernel/mpam/mpam_resctrl.c +++ b/arch/arm64/kernel/mpam/mpam_resctrl.c @@ -2280,6 +2280,8 @@ mpam_update_from_resctrl_cfg(struct mpam_resctrl_res *res, case QOS_MBA_MAX_EVENT_ID: range = MBW_MAX_BWA_FRACT(res->class->bwa_wd); mpam_cfg->mbw_max = (resctrl_cfg * range) / (MAX_MBA_BW - 1); + /* correct mbw_max if remainder is too large */ + mpam_cfg->mbw_max += ((resctrl_cfg * range) % (MAX_MBA_BW - 1)) / range; mpam_cfg->mbw_max = (mpam_cfg->mbw_max > range) ? range : mpam_cfg->mbw_max; mpam_set_feature(mpam_feat_mbw_max, &mpam_cfg->valid); @@ -2287,6 +2289,8 @@ mpam_update_from_resctrl_cfg(struct mpam_resctrl_res *res, case QOS_MBA_MIN_EVENT_ID: range = MBW_MAX_BWA_FRACT(res->class->bwa_wd); mpam_cfg->mbw_min = (resctrl_cfg * range) / (MAX_MBA_BW - 1); + /* correct mbw_min if remainder is too large */ + mpam_cfg->mbw_min += ((resctrl_cfg * range) % (MAX_MBA_BW - 1)) / range; mpam_cfg->mbw_min = (mpam_cfg->mbw_min > range) ? range : mpam_cfg->mbw_min; mpam_set_feature(mpam_feat_mbw_min, &mpam_cfg->valid);
From: Wang ShaoBo bobo.shaobowang@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61CPK CVE: NA
--------------------------------
mpam_resctrl_setup_domain() increase dom_num when domain goes online. As a result, mpam_resctrl_cpu_offline() should decrease dom_num when domain goes offline.
Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/kernel/mpam/mpam_setup.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/arm64/kernel/mpam/mpam_setup.c b/arch/arm64/kernel/mpam/mpam_setup.c index d30910e0cda2..6e71c99d19b0 100644 --- a/arch/arm64/kernel/mpam/mpam_setup.c +++ b/arch/arm64/kernel/mpam/mpam_setup.c @@ -172,6 +172,8 @@ int mpam_resctrl_cpu_offline(unsigned int cpu) list_del(&d->list); dom = container_of(d, struct mpam_resctrl_dom, resctrl_dom); kfree(dom); + + res->resctrl_res.dom_num--; }
mpam_resctrl_clear_default_cpu(cpu);
From: Wang ShaoBo bobo.shaobowang@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61CPK CVE: NA
--------------------------------
This fixes indent format error in resctrl_parse_param(): fs/resctrlfs.c:616 resctrl_parse_param() warn: ignoring unreachable code. fs/resctrlfs.c:616 resctrl_parse_param() warn: inconsistent indenting fs/resctrlfs.c:619 resctrl_parse_param() warn: inconsistent indenting
Fixes: 100e2317e9b9 ("arm64/mpam: Use fs_context to parse mount options") Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- fs/resctrlfs.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/fs/resctrlfs.c b/fs/resctrlfs.c index 8956237de47f..c0a84f40dcc0 100644 --- a/fs/resctrlfs.c +++ b/fs/resctrlfs.c @@ -622,12 +622,11 @@ static int resctrl_parse_param(struct fs_context *fc, struct fs_parameter *param case Opt_caPrio: ctx->enable_caPrio = true; return 0; + default: + break; + }
- return 0; -} - -return -EINVAL; - + return -EINVAL; }
static void resctrl_fs_context_free(struct fs_context *fc)
From: Wang ShaoBo bobo.shaobowang@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61CPK CVE: NA
--------------------------------
Do not allow min_bw below the granularity with adjusting mbw_max/min, and with setting mbw_max/min less than min_bw, return 'Invalid argument' directly.
Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/kernel/mpam/mpam_resctrl.c | 8 ++------ arch/arm64/kernel/mpam/mpam_setup.c | 3 +++ 2 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kernel/mpam/mpam_resctrl.c b/arch/arm64/kernel/mpam/mpam_resctrl.c index d00746c08922..f9a360a4c718 100644 --- a/arch/arm64/kernel/mpam/mpam_resctrl.c +++ b/arch/arm64/kernel/mpam/mpam_resctrl.c @@ -338,15 +338,11 @@ parse_bw(char *buf, struct resctrl_resource *r, switch (rr->ctrl_features[type].evt) { case QOS_MBA_MAX_EVENT_ID: case QOS_MBA_PBM_EVENT_ID: - if (kstrtoul(buf, rr->ctrl_features[type].base, &data)) - return -EINVAL; - data = (data < r->mbw.min_bw) ? r->mbw.min_bw : data; - data = roundup(data, r->mbw.bw_gran); - break; case QOS_MBA_MIN_EVENT_ID: if (kstrtoul(buf, rr->ctrl_features[type].base, &data)) return -EINVAL; - /* for mbw min feature, 0 of setting is allowed */ + if (data < r->mbw.min_bw) + return -EINVAL; data = roundup(data, r->mbw.bw_gran); break; default: diff --git a/arch/arm64/kernel/mpam/mpam_setup.c b/arch/arm64/kernel/mpam/mpam_setup.c index 6e71c99d19b0..cef66a5e6f2f 100644 --- a/arch/arm64/kernel/mpam/mpam_setup.c +++ b/arch/arm64/kernel/mpam/mpam_setup.c @@ -419,6 +419,9 @@ static int mpam_resctrl_resource_init(struct mpam_resctrl_res *res) * of 1 would appear too fine to make percentage conversions. */ r->mbw.bw_gran = GRAN_MBA_BW; + /* do not allow mbw_max/min below mbw.bw_gran */ + if (r->mbw.min_bw < r->mbw.bw_gran) + r->mbw.min_bw = r->mbw.bw_gran;
/* We will only pick a class that can monitor and control */ r->alloc_capable = true;
From: Wang ShaoBo bobo.shaobowang@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61CPK CVE: NA
--------------------------------
Refer to the two commits:
commit fd8d9db3559a ("x86/resctrl: Remove superfluous kernfs_get() calls to prevent refcount leak")
commit 758999246965 ("x86/resctrl: Add necessary kernfs_put() calls to prevent refcount leak")
there are some places where a kernfs_node reference is obtained without a corresponding release. The excessive amount of reference count on kernfs nodes will never be dropped to 0 and the kernfs nodes will never be freed in the call paths of rmdir and umount. It leads to reference count leak and kernfs_node_cache memory leak.
For example, reference count leak is observed in these two cases:
(1) mount -t resctrl none /sys/fs/resctrl mkdir /sys/fs/resctrl/c1 mkdir /sys/fs/resctrl/c1/mon_groups/m1 umount /sys/fs/resctrl
(2) mkdir /sys/fs/resctrl/c1 mkdir /sys/fs/resctrl/c1/mon_groups/m1 rmdir /sys/fs/resctrl/c1
Superfluous kernfs_get() calls are removed from two areas:
(1) In call paths of mount and mkdir, when kernfs nodes for "info", "mon_groups" and "mon_data" directories and sub-directories are created, the reference count of newly created kernfs node is set to 1. But after kernfs_create_dir() returns, superfluous kernfs_get() are called to take an additional reference.
(2) kernfs_get() calls in rmdir call paths.
Necessary kernfs_put() calls are added by following changes:
(1) Introduce rdtgroup removal helper rdtgroup_remove() to wrap up kernfs_put() and kfree().
(2) Call rdtgroup_remove() in rdtgroup removal path where the rdtgroup structure is about to be freed by kfree().
(3) Call rdtgroup_remove() or kernfs_put() as appropriate in the error exit paths of mkdir where an extra reference is taken by kernfs_get().
Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com Signed-off-by: Jialin Zhang zhangjialin11@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/include/asm/resctrl.h | 18 ++++++++++++++++ arch/arm64/kernel/mpam/mpam_ctrlmon.c | 8 ------- arch/arm64/kernel/mpam/mpam_resctrl.c | 2 +- fs/resctrlfs.c | 31 ++++++--------------------- 4 files changed, 25 insertions(+), 34 deletions(-)
diff --git a/arch/arm64/include/asm/resctrl.h b/arch/arm64/include/asm/resctrl.h index 1175c3515c92..d3bdb43b662f 100644 --- a/arch/arm64/include/asm/resctrl.h +++ b/arch/arm64/include/asm/resctrl.h @@ -545,5 +545,23 @@ DEFINE_INLINE_CTRL_FEATURE_ENABLE_FUNC(caPbm); DEFINE_INLINE_CTRL_FEATURE_ENABLE_FUNC(caMax); DEFINE_INLINE_CTRL_FEATURE_ENABLE_FUNC(caPrio);
+/** + * rdtgroup_remove - the helper to remove resource group safely + * @rdtgrp: resource group to remove + * + * On resource group creation via a mkdir, an extra kernfs_node reference is + * taken to ensure that the rdtgroup structure remains accessible for the + * rdtgroup_kn_unlock() calls where it is removed. + * + * Drop the extra reference here, then free the rdtgroup structure. + * + * Return: void + */ +static inline void rdtgroup_remove(struct rdtgroup *rdtgrp) +{ + kernfs_put(rdtgrp->kn); + kfree(rdtgrp); +} + #endif #endif /* _ASM_ARM64_RESCTRL_H */ diff --git a/arch/arm64/kernel/mpam/mpam_ctrlmon.c b/arch/arm64/kernel/mpam/mpam_ctrlmon.c index 724bed6a8e2c..8abbf2823269 100644 --- a/arch/arm64/kernel/mpam/mpam_ctrlmon.c +++ b/arch/arm64/kernel/mpam/mpam_ctrlmon.c @@ -804,7 +804,6 @@ static int resctrl_group_mkdir_info_resdir(struct resctrl_resource *r, if (IS_ERR(kn_subdir)) return PTR_ERR(kn_subdir);
- kernfs_get(kn_subdir); ret = resctrl_group_kn_set_ugid(kn_subdir); if (ret) return ret; @@ -830,7 +829,6 @@ int resctrl_group_create_info_dir(struct kernfs_node *parent_kn, *kn_info = kernfs_create_dir(parent_kn, "info", parent_kn->mode, NULL); if (IS_ERR(*kn_info)) return PTR_ERR(*kn_info); - kernfs_get(*kn_info);
ret = resctrl_group_add_files(*kn_info, RF_TOP_INFO); if (ret) @@ -865,12 +863,6 @@ int resctrl_group_create_info_dir(struct kernfs_node *parent_kn, } }
- /* - m This extra ref will be put in kernfs_remove() and guarantees - * that @rdtgrp->kn is always accessible. - */ - kernfs_get(*kn_info); - ret = resctrl_group_kn_set_ugid(*kn_info); if (ret) goto out_destroy; diff --git a/arch/arm64/kernel/mpam/mpam_resctrl.c b/arch/arm64/kernel/mpam/mpam_resctrl.c index f9a360a4c718..7370f4dcecce 100644 --- a/arch/arm64/kernel/mpam/mpam_resctrl.c +++ b/arch/arm64/kernel/mpam/mpam_resctrl.c @@ -1331,7 +1331,7 @@ static void move_myself(struct callback_head *head) (rdtgrp->flags & RDT_DELETED)) { current->closid = 0; current->rmid = 0; - kfree(rdtgrp); + rdtgroup_remove(rdtgrp); }
preempt_disable(); diff --git a/fs/resctrlfs.c b/fs/resctrlfs.c index c0a84f40dcc0..e02e4769edc0 100644 --- a/fs/resctrlfs.c +++ b/fs/resctrlfs.c @@ -211,8 +211,7 @@ void resctrl_group_kn_unlock(struct kernfs_node *kn) if (atomic_dec_and_test(&rdtgrp->waitcount) && (rdtgrp->flags & RDT_DELETED)) { kernfs_unbreak_active_protection(kn); - kernfs_put(rdtgrp->kn); - kfree(rdtgrp); + rdtgroup_remove(rdtgrp); } else { kernfs_unbreak_active_protection(kn); } @@ -272,12 +271,6 @@ mongroup_create_dir(struct kernfs_node *parent_kn, struct resctrl_group *prgrp, if (dest_kn) *dest_kn = kn;
- /* - * This extra ref will be put in kernfs_remove() and guarantees - * that @rdtgrp->kn is always accessible. - */ - kernfs_get(kn); - ret = resctrl_group_kn_set_ugid(kn); if (ret) goto out_destroy; @@ -399,8 +392,6 @@ static int resctrl_get_tree(struct fs_context *fc) if (ret) goto out_info;
- kernfs_get(kn_mongrp); - ret = mkdir_mondata_all_prepare(&resctrl_group_default); if (ret < 0) goto out_mongrp; @@ -410,7 +401,6 @@ static int resctrl_get_tree(struct fs_context *fc) if (ret) goto out_mongrp;
- kernfs_get(kn_mondata); resctrl_group_default.mon.mon_data_kn = kn_mondata; }
@@ -495,7 +485,7 @@ static void free_all_child_rdtgrp(struct resctrl_group *rdtgrp) /* rmid may not be used */ rmid_free(sentry->mon.rmid); list_del(&sentry->mon.crdtgrp_list); - kfree(sentry); + rdtgroup_remove(sentry); } }
@@ -529,7 +519,7 @@ static void rmdir_all_sub(void)
kernfs_remove(rdtgrp->kn); list_del(&rdtgrp->resctrl_group_list); - kfree(rdtgrp); + rdtgroup_remove(rdtgrp); } /* Notify online CPUs to update per cpu storage and PQR_ASSOC MSR */ update_closid_rmid(cpu_online_mask, &resctrl_group_default); @@ -775,7 +765,7 @@ static int mkdir_resctrl_prepare(struct kernfs_node *parent_kn, * kernfs_remove() will drop the reference count on "kn" which * will free it. But we still need it to stick around for the * resctrl_group_kn_unlock(kn} call below. Take one extra reference - * here, which will be dropped inside resctrl_group_kn_unlock(). + * here, which will be dropped inside rdtgroup_remove(). */ kernfs_get(kn);
@@ -815,6 +805,7 @@ static int mkdir_resctrl_prepare(struct kernfs_node *parent_kn, out_prepare_clean: mkdir_mondata_all_prepare_clean(rdtgrp); out_destroy: + kernfs_put(rdtgrp->kn); kernfs_remove(rdtgrp->kn); out_free_rmid: rmid_free(rdtgrp->mon.rmid); @@ -831,7 +822,7 @@ static int mkdir_resctrl_prepare(struct kernfs_node *parent_kn, static void mkdir_resctrl_prepare_clean(struct resctrl_group *rgrp) { kernfs_remove(rgrp->kn); - kfree(rgrp); + rdtgroup_remove(rgrp); }
/* @@ -996,11 +987,6 @@ static int resctrl_group_rmdir_mon(struct kernfs_node *kn, struct resctrl_group { resctrl_group_rm_mon(rdtgrp, tmpmask);
- /* - * one extra hold on this, will drop when we kfree(rdtgrp) - * in resctrl_group_kn_unlock() - */ - kernfs_get(kn); kernfs_remove(rdtgrp->kn);
return 0; @@ -1049,11 +1035,6 @@ static int resctrl_group_rmdir_ctrl(struct kernfs_node *kn, struct resctrl_group { resctrl_group_rm_ctrl(rdtgrp, tmpmask);
- /* - * one extra hold on this, will drop when we kfree(rdtgrp) - * in resctrl_group_kn_unlock() - */ - kernfs_get(kn); kernfs_remove(rdtgrp->kn);
return 0;
From: Jialin Zhang zhangjialin11@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61CPK CVE: NA
--------------------------------
Update last_cmd_status to tell the reasons of returning 'Invalid argument' in parse_cache() and parse_bw().
Signed-off-by: Jialin Zhang zhangjialin11@huawei.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- arch/arm64/kernel/mpam/mpam_resctrl.c | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kernel/mpam/mpam_resctrl.c b/arch/arm64/kernel/mpam/mpam_resctrl.c index 7370f4dcecce..4a1e376ec497 100644 --- a/arch/arm64/kernel/mpam/mpam_resctrl.c +++ b/arch/arm64/kernel/mpam/mpam_resctrl.c @@ -310,11 +310,15 @@ parse_cache(char *buf, struct resctrl_resource *r, return -EINVAL; }
- if (kstrtoul(buf, rr->ctrl_features[type].base, &data)) + if (kstrtoul(buf, rr->ctrl_features[type].base, &data)) { + rdt_last_cmd_printf("Non-hex character in the mask %s\n", buf); return -EINVAL; + }
- if (data >= rr->ctrl_features[type].max_wd) + if (data >= rr->ctrl_features[type].max_wd) { + rdt_last_cmd_puts("Mask out of range\n"); return -EINVAL; + }
cfg->new_ctrl[type] = data; cfg->have_new_ctrl = true; @@ -339,20 +343,30 @@ parse_bw(char *buf, struct resctrl_resource *r, case QOS_MBA_MAX_EVENT_ID: case QOS_MBA_PBM_EVENT_ID: case QOS_MBA_MIN_EVENT_ID: - if (kstrtoul(buf, rr->ctrl_features[type].base, &data)) + if (kstrtoul(buf, rr->ctrl_features[type].base, &data)) { + rdt_last_cmd_printf("Non-decimal digit in MB value %s\n", buf); return -EINVAL; - if (data < r->mbw.min_bw) + } + if (data < r->mbw.min_bw) { + rdt_last_cmd_printf("MB value %ld out of range [%d,%d]\n", data, + r->mbw.min_bw, rr->ctrl_features[type].max_wd - 1); return -EINVAL; + } data = roundup(data, r->mbw.bw_gran); break; default: - if (kstrtoul(buf, rr->ctrl_features[type].base, &data)) + if (kstrtoul(buf, rr->ctrl_features[type].base, &data)) { + rdt_last_cmd_printf("Non-decimal digit in MB value %s\n", buf); return -EINVAL; + } break; }
- if (data >= rr->ctrl_features[type].max_wd) + if (data >= rr->ctrl_features[type].max_wd) { + rdt_last_cmd_printf("MB value %ld out of range [%d,%d]\n", data, + r->mbw.min_bw, rr->ctrl_features[type].max_wd - 1); return -EINVAL; + }
cfg->new_ctrl[type] = data; cfg->have_new_ctrl = true;
From: Peter Zijlstra peterz@infradead.org
mainline inclusion from mainline-v5.11-rc1 commit 55d2eba8e7cd439c11cdb204898c2d384227629b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61CQ3
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
When the static_key is part of the module, and the module calls static_key_inc/enable() from it's __init section *AND* has a static_branch_*() user in that very same __init section, things go wobbly.
If the static_key lives outside the module, jump_label_add_module() would append this module's sites to the key and jump_label_update() would take the static_key_linked() branch and all would be fine.
If all the sites are outside of __init, then everything will be fine too.
However, when all is aligned just as described above, jump_label_update() calls __jump_label_update(.init = false) and we'll not update sites in __init text.
Fixes: 19483677684b ("jump_label: Annotate entries that operate on __init code earlier") Reported-by: Dexuan Cui decui@microsoft.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Josh Poimboeuf jpoimboe@redhat.com Tested-by: Jessica Yu jeyu@kernel.org Link: https://lkml.kernel.org/r/20201216135435.GV3092@hirez.programming.kicks-ass.... Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/jump_label.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/jump_label.c b/kernel/jump_label.c index 7470cdc432a0..600dbdab24e7 100644 --- a/kernel/jump_label.c +++ b/kernel/jump_label.c @@ -823,6 +823,7 @@ int jump_label_text_reserved(void *start, void *end) static void jump_label_update(struct static_key *key) { struct jump_entry *stop = __stop___jump_table; + bool init = system_state < SYSTEM_RUNNING; struct jump_entry *entry; #ifdef CONFIG_MODULES struct module *mod; @@ -834,15 +835,16 @@ static void jump_label_update(struct static_key *key)
preempt_disable(); mod = __module_address((unsigned long)key); - if (mod) + if (mod) { stop = mod->jump_entries + mod->num_jump_entries; + init = mod->state == MODULE_STATE_COMING; + } preempt_enable(); #endif entry = static_key_entries(key); /* if there are no users, entry can be NULL */ if (entry) - __jump_label_update(key, entry, stop, - system_state < SYSTEM_RUNNING); + __jump_label_update(key, entry, stop, init); }
#ifdef CONFIG_STATIC_KEYS_SELFTEST
From: Zhouyi Zhou zhouzhouyi@gmail.com
mainline inclusion from mainline-v5.12 commit 0c89d87d1d43d9fa268d1dc489518564d58bf497 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61CQ3
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Commit 40607ee97e4e ("preempt/dynamic: Provide irqentry_exit_cond_resched() static call") tried to provide irqentry_exit_cond_resched() static call in irqentry_exit, but has a typo in macro conditional statement.
Fixes: 40607ee97e4e ("preempt/dynamic: Provide irqentry_exit_cond_resched() static call") Signed-off-by: Zhouyi Zhou zhouzhouyi@gmail.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20210410073523.5493-1-zhouzhouyi@gmail.com Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/entry/common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/entry/common.c b/kernel/entry/common.c index cea3957ebdbc..2228de39bb4f 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -394,7 +394,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state)
instrumentation_begin(); if (IS_ENABLED(CONFIG_PREEMPTION)) { -#ifdef CONFIG_PREEMT_DYNAMIC +#ifdef CONFIG_PREEMPT_DYNAMIC static_call(irqentry_exit_cond_resched)(); #else irqentry_exit_cond_resched();
From: Johan Hovold johan@kernel.org
mainline inclusion from mainline-v5.15-rc6 commit 57116ce17b04fde2fe30f0859df69d8dbe5809f6 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61CQ3
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
-------------------------------------------------
Console drivers often queue work while holding locks also taken in their console write paths, something which can lead to deadlocks on SMP when dumping workqueue state (e.g. sysrq-t or on suspend failures).
For serial console drivers this could look like:
CPU0 CPU1 ---- ----
show_workqueue_state(); lock(&pool->lock); <IRQ> lock(&port->lock); schedule_work(); lock(&pool->lock); printk(); lock(console_owner); lock(&port->lock);
where workqueues are, for example, used to push data to the line discipline, process break signals and handle modem-status changes. Line disciplines and serdev drivers can also queue work on write-wakeup notifications, etc.
Reworking every console driver to avoid queuing work while holding locks also taken in their write paths would complicate drivers and is neither desirable or feasible.
Instead use the deferred-printk mechanism to avoid printing while holding pool locks when dumping workqueue state.
Note that there are a few WARN_ON() assertions in the workqueue code which could potentially also trigger a deadlock. Hopefully the ongoing printk rework will provide a general solution for this eventually.
This was originally reported after a lockdep splat when executing sysrq-t with the imx serial driver.
Fixes: 3494fc30846d ("workqueue: dump workqueues on sysrq-t") Cc: stable@vger.kernel.org # 4.0 Reported-by: Fabio Estevam festevam@denx.de Tested-by: Fabio Estevam festevam@denx.de Signed-off-by: Johan Hovold johan@kernel.org Reviewed-by: John Ogness john.ogness@linutronix.de Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/workqueue.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 422ee6312475..a27605c17f07 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -4812,8 +4812,16 @@ void show_workqueue_state(void)
for_each_pwq(pwq, wq) { raw_spin_lock_irqsave(&pwq->pool->lock, flags); - if (pwq->nr_active || !list_empty(&pwq->delayed_works)) + if (pwq->nr_active || !list_empty(&pwq->delayed_works)) { + /* + * Defer printing to avoid deadlocks in console + * drivers that queue work while holding locks + * also taken in their write paths. + */ + printk_safe_enter(); show_pwq(pwq); + printk_safe_exit(); + } raw_spin_unlock_irqrestore(&pwq->pool->lock, flags); /* * We could be printing a lot from atomic context, e.g. @@ -4831,7 +4839,12 @@ void show_workqueue_state(void) raw_spin_lock_irqsave(&pool->lock, flags); if (pool->nr_workers == pool->nr_idle) goto next_pool; - + /* + * Defer printing to avoid deadlocks in console drivers that + * queue work while holding locks also taken in their write + * paths. + */ + printk_safe_enter(); pr_info("pool %d:", pool->id); pr_cont_pool_info(pool); pr_cont(" hung=%us workers=%d", @@ -4846,6 +4859,7 @@ void show_workqueue_state(void) first = false; } pr_cont("\n"); + printk_safe_exit(); next_pool: raw_spin_unlock_irqrestore(&pool->lock, flags); /*
From: Sungwoo Kim iam@sung-woo.kim
maillist inclusion category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I63D3E CVE: CVE-2022-45934
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?...
--------------------------------
By keep sending L2CAP_CONF_REQ packets, chan->num_conf_rsp increases multiple times and eventually it will wrap around the maximum number (i.e., 255). This patch prevents this by adding a boundary check with L2CAP_MAX_CONF_RSP
Btmon log: Bluetooth monitor ver 5.64 = Note: Linux version 6.1.0-rc2 (x86_64) 0.264594 = Note: Bluetooth subsystem version 2.22 0.264636 @ MGMT Open: btmon (privileged) version 1.22 {0x0001} 0.272191 = New Index: 00:00:00:00:00:00 (Primary,Virtual,hci0) [hci0] 13.877604 @ RAW Open: 9496 (privileged) version 2.22 {0x0002} 13.890741 = Open Index: 00:00:00:00:00:00 [hci0] 13.900426 (...)
ACL Data RX: Handle 200 flags 0x00 dlen 1033 #32 [hci0] 14.273106
invalid packet size (12 != 1033) 08 00 01 00 02 01 04 00 01 10 ff ff ............
ACL Data RX: Handle 200 flags 0x00 dlen 1547 #33 [hci0] 14.273561
invalid packet size (14 != 1547) 0a 00 01 00 04 01 06 00 40 00 00 00 00 00 ........@.....
ACL Data RX: Handle 200 flags 0x00 dlen 2061 #34 [hci0] 14.274390
invalid packet size (16 != 2061) 0c 00 01 00 04 01 08 00 40 00 00 00 00 00 00 04 ........@.......
ACL Data RX: Handle 200 flags 0x00 dlen 2061 #35 [hci0] 14.274932
invalid packet size (16 != 2061) 0c 00 01 00 04 01 08 00 40 00 00 00 07 00 03 00 ........@....... = bluetoothd: Bluetooth daemon 5.43 14.401828
ACL Data RX: Handle 200 flags 0x00 dlen 1033 #36 [hci0] 14.275753
invalid packet size (12 != 1033) 08 00 01 00 04 01 04 00 40 00 00 00 ........@...
Signed-off-by: Sungwoo Kim iam@sung-woo.kim Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Baisong Zhong zhongbaisong@huawei.com Reviewed-by: Liu Jian liujian56@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Xiu Jianfeng xiujianfeng@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- net/bluetooth/l2cap_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index bbba3beffcd3..70bfd9e8913e 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -4440,7 +4440,8 @@ static inline int l2cap_config_req(struct l2cap_conn *conn,
chan->ident = cmd->ident; l2cap_send_cmd(conn, cmd->ident, L2CAP_CONF_RSP, len, rsp); - chan->num_conf_rsp++; + if (chan->num_conf_rsp < L2CAP_CONF_MAX_CONF_RSP) + chan->num_conf_rsp++;
/* Reset config buffer. */ chan->conf_len = 0;
From: Jakub Sitnicki jakub@cloudflare.com
mainline inclusion from mainline-v6.1-rc6 commit b68777d54fac21fc833ec26ea1a2a84f975ab035 category: bugfix bugzilla: 188056, https://gitee.com/src-openeuler/kernel/issues/I62RNU CVE: CVE-2022-4129
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
sk->sk_user_data has multiple users, which are not compatible with each other. Writers must synchronize by grabbing the sk->sk_callback_lock.
l2tp currently fails to grab the lock when modifying the underlying tunnel socket fields. Fix it by adding appropriate locking.
We err on the side of safety and grab the sk_callback_lock also inside the sk_destruct callback overridden by l2tp, even though there should be no refs allowing access to the sock at the time when sk_destruct gets called.
v4: - serialize write to sk_user_data in l2tp sk_destruct
v3: - switch from sock lock to sk_callback_lock - document write-protection for sk_user_data
v2: - update Fixes to point to origin of the bug - use real names in Reported/Tested-by tags
Cc: Tom Parkin tparkin@katalix.com Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core") Reported-by: Haowei Yan g1042620637@gmail.com Signed-off-by: Jakub Sitnicki jakub@cloudflare.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Lu Wei luwei32@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/net/sock.h | 2 +- net/l2tp/l2tp_core.c | 19 +++++++++++++------ 2 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h index f4de05072597..21544682f9d5 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -318,7 +318,7 @@ struct bpf_local_storage; * @sk_tskey: counter to disambiguate concurrent tstamp requests * @sk_zckey: counter to order MSG_ZEROCOPY notifications * @sk_socket: Identd and reporting IO signals - * @sk_user_data: RPC layer private data + * @sk_user_data: RPC layer private data. Write-protected by @sk_callback_lock. * @sk_frag: cached page frag * @sk_peek_off: current peek_offset value * @sk_send_head: front of stuff to transmit diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c index dc8987ed08ad..e89852bc5309 100644 --- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -1150,8 +1150,10 @@ static void l2tp_tunnel_destruct(struct sock *sk) }
/* Remove hooks into tunnel socket */ + write_lock_bh(&sk->sk_callback_lock); sk->sk_destruct = tunnel->old_sk_destruct; sk->sk_user_data = NULL; + write_unlock_bh(&sk->sk_callback_lock);
/* Call the original destructor */ if (sk->sk_destruct) @@ -1471,16 +1473,18 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, sock = sockfd_lookup(tunnel->fd, &ret); if (!sock) goto err; - - ret = l2tp_validate_socket(sock->sk, net, tunnel->encap); - if (ret < 0) - goto err_sock; }
+ sk = sock->sk; + write_lock(&sk->sk_callback_lock); + + ret = l2tp_validate_socket(sk, net, tunnel->encap); + if (ret < 0) + goto err_sock; + tunnel->l2tp_net = net; pn = l2tp_pernet(net);
- sk = sock->sk; sock_hold(sk); tunnel->sock = sk;
@@ -1506,7 +1510,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
setup_udp_tunnel_sock(net, sock, &udp_cfg); } else { - sk->sk_user_data = tunnel; + rcu_assign_sk_user_data(sk, tunnel); }
tunnel->old_sk_destruct = sk->sk_destruct; @@ -1520,6 +1524,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, if (tunnel->fd >= 0) sockfd_put(sock);
+ write_unlock(&sk->sk_callback_lock); return 0;
err_sock: @@ -1527,6 +1532,8 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, sock_release(sock); else sockfd_put(sock); + + write_unlock(&sk->sk_callback_lock); err: return ret; }
From: Jakub Sitnicki jakub@cloudflare.com
mainline inclusion from mainline-v6.1-rc7 commit af295e854a4e3813ffbdef26dbb6a4d6226c3ea1 category: bugfix bugzilla: 188056, https://gitee.com/src-openeuler/kernel/issues/I62RNU CVE: CVE-2022-4129
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When holding a reader-writer spin lock we cannot sleep. Calling setup_udp_tunnel_sock() with write lock held violates this rule, because we end up calling percpu_down_read(), which might sleep, as syzbot reports [1]:
__might_resched.cold+0x222/0x26b kernel/sched/core.c:9890 percpu_down_read include/linux/percpu-rwsem.h:49 [inline] cpus_read_lock+0x1b/0x140 kernel/cpu.c:310 static_key_slow_inc+0x12/0x20 kernel/jump_label.c:158 udp_tunnel_encap_enable include/net/udp_tunnel.h:187 [inline] setup_udp_tunnel_sock+0x43d/0x550 net/ipv4/udp_tunnel_core.c:81 l2tp_tunnel_register+0xc51/0x1210 net/l2tp/l2tp_core.c:1509 pppol2tp_connect+0xcdc/0x1a10 net/l2tp/l2tp_ppp.c:723
Trim the writer-side critical section for sk_callback_lock down to the minimum, so that it covers only operations on sk_user_data.
Also, when grabbing the sk_callback_lock, we always need to disable BH, as Eric points out. Failing to do so leads to deadlocks because we acquire sk_callback_lock in softirq context, which can get stuck waiting on us if:
1) it runs on the same CPU, or
CPU0 ---- lock(clock-AF_INET6); <Interrupt> lock(clock-AF_INET6);
2) lock ordering leads to priority inversion
CPU0 CPU1 ---- ---- lock(clock-AF_INET6); local_irq_disable(); lock(&tcp_hashinfo.bhash[i].lock); lock(clock-AF_INET6); <Interrupt> lock(&tcp_hashinfo.bhash[i].lock);
... as syzbot reports [2,3]. Use the _bh variants for write_(un)lock.
[1] https://lore.kernel.org/netdev/0000000000004e78ec05eda79749@google.com/ [2] https://lore.kernel.org/netdev/000000000000e38b6605eda76f98@google.com/ [3] https://lore.kernel.org/netdev/000000000000dfa31e05eda76f75@google.com/
v2: - Check and set sk_user_data while holding sk_callback_lock for both L2TP encapsulation types (IP and UDP) (Tetsuo)
Cc: Tom Parkin tparkin@katalix.com Cc: Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp Fixes: b68777d54fac ("l2tp: Serialize access to sk_user_data with sk_callback_lock") Reported-by: Eric Dumazet edumazet@google.com Reported-by: syzbot+703d9e154b3b58277261@syzkaller.appspotmail.com Reported-by: syzbot+50680ced9e98a61f7698@syzkaller.appspotmail.com Reported-by: syzbot+de987172bb74a381879b@syzkaller.appspotmail.com Signed-off-by: Jakub Sitnicki jakub@cloudflare.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Lu Wei luwei32@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- net/l2tp/l2tp_core.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c index e89852bc5309..d6bb1795329a 100644 --- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -1476,11 +1476,12 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, }
sk = sock->sk; - write_lock(&sk->sk_callback_lock); - + write_lock_bh(&sk->sk_callback_lock); ret = l2tp_validate_socket(sk, net, tunnel->encap); if (ret < 0) - goto err_sock; + goto err_inval_sock; + rcu_assign_sk_user_data(sk, tunnel); + write_unlock_bh(&sk->sk_callback_lock);
tunnel->l2tp_net = net; pn = l2tp_pernet(net); @@ -1509,8 +1510,6 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, };
setup_udp_tunnel_sock(net, sock, &udp_cfg); - } else { - rcu_assign_sk_user_data(sk, tunnel); }
tunnel->old_sk_destruct = sk->sk_destruct; @@ -1524,16 +1523,18 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net, if (tunnel->fd >= 0) sockfd_put(sock);
- write_unlock(&sk->sk_callback_lock); return 0;
err_sock: + write_lock_bh(&sk->sk_callback_lock); + rcu_assign_sk_user_data(sk, NULL); +err_inval_sock: + write_unlock_bh(&sk->sk_callback_lock); + if (tunnel->fd < 0) sock_release(sock); else sockfd_put(sock); - - write_unlock(&sk->sk_callback_lock); err: return ret; }
From: Junhao He hejunhao3@huawei.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5YCYK
--------------------------------------------------------------------------
The TRCSEQRSTEVR and TRCSEQSTR register is not implemented if the TRCIDR5.NUMSEQSTATE == 0. Skip accessing the register in such cases.
Signed-off-by: Junhao He hejunhao3@huawei.com Reviewed-by: Ling Mingqiang lingmingqiang@huawei.com Reviewed-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Yang Shen shenyang39@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- .../coresight/coresight-etm4x-core.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index 9d0895cde9d6..eec2ded50646 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -428,8 +428,10 @@ static int etm4_enable_hw(struct etmv4_drvdata *drvdata) etm4x_relaxed_write32(csa, config->vipcssctlr, TRCVIPCSSCTLR); for (i = 0; i < drvdata->nrseqstate - 1; i++) etm4x_relaxed_write32(csa, config->seq_ctrl[i], TRCSEQEVRn(i)); - etm4x_relaxed_write32(csa, config->seq_rst, TRCSEQRSTEVR); - etm4x_relaxed_write32(csa, config->seq_state, TRCSEQSTR); + if (drvdata->nrseqstate) { + etm4x_relaxed_write32(csa, config->seq_rst, TRCSEQRSTEVR); + etm4x_relaxed_write32(csa, config->seq_state, TRCSEQSTR); + } etm4x_relaxed_write32(csa, config->ext_inp, TRCEXTINSELR);
for (i = 0; i < drvdata->nr_cntr; i++) { @@ -1622,9 +1624,10 @@ static int __etm4_cpu_save(struct etmv4_drvdata *drvdata) for (i = 0; i < drvdata->nrseqstate - 1; i++) state->trcseqevr[i] = etm4x_read32(csa, TRCSEQEVRn(i));
- state->trcseqrstevr = etm4x_read32(csa, TRCSEQRSTEVR); - - state->trcseqstr = etm4x_read32(csa, TRCSEQSTR); + if (drvdata->nrseqstate) { + state->trcseqrstevr = etm4x_read32(csa, TRCSEQRSTEVR); + state->trcseqstr = etm4x_read32(csa, TRCSEQSTR); + } state->trcextinselr = etm4x_read32(csa, TRCEXTINSELR);
for (i = 0; i < drvdata->nr_cntr; i++) { @@ -1751,8 +1754,10 @@ static void __etm4_cpu_restore(struct etmv4_drvdata *drvdata) for (i = 0; i < drvdata->nrseqstate - 1; i++) etm4x_relaxed_write32(csa, state->trcseqevr[i], TRCSEQEVRn(i));
- etm4x_relaxed_write32(csa, state->trcseqrstevr, TRCSEQRSTEVR); - etm4x_relaxed_write32(csa, state->trcseqstr, TRCSEQSTR); + if (drvdata->nrseqstate) { + etm4x_relaxed_write32(csa, state->trcseqrstevr, TRCSEQRSTEVR); + etm4x_relaxed_write32(csa, state->trcseqstr, TRCSEQSTR); + } etm4x_relaxed_write32(csa, state->trcextinselr, TRCEXTINSELR);
for (i = 0; i < drvdata->nr_cntr; i++) {
From: Junhao He hejunhao3@huawei.com
driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5YCYK
--------------------------------------------------------------------------
Add the acpi match id for Hip09 ETE
Signed-off-by: Junhao He hejunhao3@huawei.com Reviewed-by: Ling Mingqiang lingmingqiang@huawei.com Reviewed-by: Yicong Yang yangyicong@huawei.com Reviewed-by: Xiongfeng Wang wangxiongfeng2@huawei.com Reviewed-by: Yang Shen shenyang39@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/hwtracing/coresight/coresight-etm4x-core.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index eec2ded50646..95c92d2d3915 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -2140,12 +2140,19 @@ static const struct of_device_id etm4_sysreg_match[] = { {} };
+static const struct acpi_device_id static_ete_ids[] = { + {"HISI0461", 0}, + {} +}; +MODULE_DEVICE_TABLE(acpi, static_ete_ids); + static struct platform_driver etm4_platform_driver = { .probe = etm4_probe_platform_dev, .remove = etm4_remove_platform_dev, .driver = { .name = "coresight-etm4x", .of_match_table = etm4_sysreg_match, + .acpi_match_table = static_ete_ids, .suppress_bind_attrs = true, }, };
From: Yicong Yang yangyicong@hisilicon.com
mainline inclusion from mainline-v5.12-rc1 commit 566c6120f095 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I63WDM CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------------------------------------------------
Currently we use concrete version to determine the max_cmd_dword. New entries should be added for compatible hardwares of new version or on new platform, otherwise the device will use 16 dwords instead of 64 even if it supports, which will degrade the performance. This will decrease the compatibility and the maintainability.
Drop the switch-case statement of the version checking. Only version less than 0x351 supports maximum 16 command dwords.
Signed-off-by: Yicong Yang yangyicong@hisilicon.com Acked-by: John Garry john.garry@huawei.com Link: https://lore.kernel.org/r/1610526716-14882-1-git-send-email-yangyicong@hisil... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Wangming Shao shaowangming@h-partners.com Reviewed-by: Yicong Yang yangyicong@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/spi/spi-hisi-sfc-v3xx.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/spi/spi-hisi-sfc-v3xx.c b/drivers/spi/spi-hisi-sfc-v3xx.c index 4650b483a33d..832b80e7ef67 100644 --- a/drivers/spi/spi-hisi-sfc-v3xx.c +++ b/drivers/spi/spi-hisi-sfc-v3xx.c @@ -465,14 +465,10 @@ static int hisi_sfc_v3xx_probe(struct platform_device *pdev)
version = readl(host->regbase + HISI_SFC_V3XX_VERSION);
- switch (version) { - case 0x351: + if (version >= 0x351) host->max_cmd_dword = 64; - break; - default: + else host->max_cmd_dword = 16; - break; - }
ret = devm_spi_register_controller(dev, ctlr); if (ret)
From: Yicong Yang yangyicong@hisilicon.com
mainline inclusion from mainline-v5.12-rc1 commit 6d2386e36440 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I63WDM CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------------------------------------------------
The address mode is either 3 or 4 for the controller, which is configured by the firmware and cannot be modified in the OS driver. Get the firmware configuration and add address mode check in the .supports_op() to block invalid operations.
Signed-off-by: Yicong Yang yangyicong@hisilicon.com Acked-by: John Garry john.garry@huawei.com Link: https://lore.kernel.org/r/1611740450-47975-3-git-send-email-yangyicong@hisil... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Wangming Shao shaowangming@h-partners.com Reviewed-by: Yicong Yang yangyicong@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/spi/spi-hisi-sfc-v3xx.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/drivers/spi/spi-hisi-sfc-v3xx.c b/drivers/spi/spi-hisi-sfc-v3xx.c index 832b80e7ef67..385eb7bba05a 100644 --- a/drivers/spi/spi-hisi-sfc-v3xx.c +++ b/drivers/spi/spi-hisi-sfc-v3xx.c @@ -19,6 +19,8 @@
#define HISI_SFC_V3XX_VERSION (0x1f8)
+#define HISI_SFC_V3XX_GLB_CFG (0x100) +#define HISI_SFC_V3XX_GLB_CFG_CS0_ADDR_MODE BIT(2) #define HISI_SFC_V3XX_RAW_INT_STAT (0x120) #define HISI_SFC_V3XX_INT_STAT (0x124) #define HISI_SFC_V3XX_INT_MASK (0x128) @@ -75,6 +77,7 @@ struct hisi_sfc_v3xx_host { void __iomem *regbase; int max_cmd_dword; struct completion *completion; + u8 address_mode; int irq; };
@@ -168,10 +171,18 @@ static int hisi_sfc_v3xx_adjust_op_size(struct spi_mem *mem, static bool hisi_sfc_v3xx_supports_op(struct spi_mem *mem, const struct spi_mem_op *op) { + struct spi_device *spi = mem->spi; + struct hisi_sfc_v3xx_host *host; + + host = spi_controller_get_devdata(spi->master); + if (op->data.buswidth > 4 || op->dummy.buswidth > 4 || op->addr.buswidth > 4 || op->cmd.buswidth > 4) return false;
+ if (op->addr.nbytes != host->address_mode && op->addr.nbytes) + return false; + return spi_mem_default_supports_op(mem, op); }
@@ -416,7 +427,7 @@ static int hisi_sfc_v3xx_probe(struct platform_device *pdev) struct device *dev = &pdev->dev; struct hisi_sfc_v3xx_host *host; struct spi_controller *ctlr; - u32 version; + u32 version, glb_config; int ret;
ctlr = spi_alloc_master(&pdev->dev, sizeof(*host)); @@ -463,6 +474,18 @@ static int hisi_sfc_v3xx_probe(struct platform_device *pdev) ctlr->num_chipselect = 1; ctlr->mem_ops = &hisi_sfc_v3xx_mem_ops;
+ /* + * The address mode of the controller is either 3 or 4, + * which is indicated by the address mode bit in + * the global config register. The register is read only + * for the OS driver. + */ + glb_config = readl(host->regbase + HISI_SFC_V3XX_GLB_CFG); + if (glb_config & HISI_SFC_V3XX_GLB_CFG_CS0_ADDR_MODE) + host->address_mode = 4; + else + host->address_mode = 3; + version = readl(host->regbase + HISI_SFC_V3XX_VERSION);
if (version >= 0x351)
From: Yicong Yang yangyicong@hisilicon.com
mainline inclusion from mainline-v5.13-rc1 commit 4c84e42d29af category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I63WDM CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------------------------------------------------
We mask the irq when the command completion is timeout. This won't stop the already running irq handler. Use sychronize_irq() after we mask the irq, to make sure there is no running handler.
Acked-by: John Garry john.garry@huawei.com Signed-off-by: Yicong Yang yangyicong@hisilicon.com Link: https://lore.kernel.org/r/1618228708-37949-2-git-send-email-yangyicong@hisil... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Wangming Shao shaowangming@h-partners.com Reviewed-by: Yicong Yang yangyicong@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/spi/spi-hisi-sfc-v3xx.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/spi/spi-hisi-sfc-v3xx.c b/drivers/spi/spi-hisi-sfc-v3xx.c index 385eb7bba05a..0d9e10302b66 100644 --- a/drivers/spi/spi-hisi-sfc-v3xx.c +++ b/drivers/spi/spi-hisi-sfc-v3xx.c @@ -342,6 +342,7 @@ static int hisi_sfc_v3xx_generic_exec_op(struct hisi_sfc_v3xx_host *host, ret = 0;
hisi_sfc_v3xx_disable_int(host); + synchronize_irq(host->irq); host->completion = NULL; } else { ret = hisi_sfc_v3xx_wait_cmd_idle(host);
From: Yicong Yang yangyicong@hisilicon.com
mainline inclusion from mainline-v5.13-rc1 commit 4a46f88681ca category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I63WDM CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------------------------------------------------
We use ACPI_PTR() and related ifendif protection for the id table. This is unnecessary as the struct acpi_device_id is defined in mod_devicetable.h and doesn't rely on ACPI. The driver doesn't use any ACPI apis, so it can be compiled in the ACPI=n case with no warnings.
So remove the ACPI_PTR and related ifendif protection, also replace the header acpi.h with mod_devicetable.h.
Acked-by: John Garry john.garry@huawei.com Signed-off-by: Yicong Yang yangyicong@hisilicon.com Link: https://lore.kernel.org/r/1618228708-37949-3-git-send-email-yangyicong@hisil... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Wangming Shao shaowangming@h-partners.com Reviewed-by: Yicong Yang yangyicong@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/spi/spi-hisi-sfc-v3xx.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/spi/spi-hisi-sfc-v3xx.c b/drivers/spi/spi-hisi-sfc-v3xx.c index 0d9e10302b66..d3a23b1c2a4c 100644 --- a/drivers/spi/spi-hisi-sfc-v3xx.c +++ b/drivers/spi/spi-hisi-sfc-v3xx.c @@ -5,13 +5,13 @@ // Copyright (c) 2019 HiSilicon Technologies Co., Ltd. // Author: John Garry john.garry@huawei.com
-#include <linux/acpi.h> #include <linux/bitops.h> #include <linux/completion.h> #include <linux/dmi.h> #include <linux/interrupt.h> #include <linux/iopoll.h> #include <linux/module.h> +#include <linux/mod_devicetable.h> #include <linux/platform_device.h> #include <linux/slab.h> #include <linux/spi/spi.h> @@ -508,18 +508,16 @@ static int hisi_sfc_v3xx_probe(struct platform_device *pdev) return ret; }
-#if IS_ENABLED(CONFIG_ACPI) static const struct acpi_device_id hisi_sfc_v3xx_acpi_ids[] = { {"HISI0341", 0}, {} }; MODULE_DEVICE_TABLE(acpi, hisi_sfc_v3xx_acpi_ids); -#endif
static struct platform_driver hisi_sfc_v3xx_spi_driver = { .driver = { .name = "hisi-sfc-v3xx", - .acpi_match_table = ACPI_PTR(hisi_sfc_v3xx_acpi_ids), + .acpi_match_table = hisi_sfc_v3xx_acpi_ids, }, .probe = hisi_sfc_v3xx_probe, };
From: Chen Wandun chenwandun@huawei.com
mainline inclusion from mainline-v6.1-rc7 commit de1ccfb648243a031cfbdc2d5571dfdaf5023106 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I645DG CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
A softlockup occurs in scan free swap slot under huge memory pressure. The test scenario is: 64 CPU cores, 64GB memory, and 28 zram devices, the disksize of each zram device is 50MB.
LATENCY_LIMIT is used to prevent softlockups in scan_swap_map_slots(), but the real loop number would more than LATENCY_LIMIT because of "goto checks and goto scan" repeatly without decreasing latency limit.
In order to fix it, decrease latency_ration in advance.
There is also a suspicious place that will cause softlockups in get_swap_pages(). In this function, the "goto start_over" may result in continuous scanning of the swap partition. If there is no cond_sched in scan_swap_map_slots(), it would cause a softlockup (I am not sure about this).
WARN: soft lockup - CPU#11 stuck for 11s! [kswapd0:466] CPU: 11 PID: 466 Comm: kswapd@ Kdump: loaded Tainted: G dump backtrace+0x0/0x1le4 show stack+0x20/@x2c dump_stack+0xd8/0x140 watchdog print_info+0x48/0x54 watchdog_process_before_softlockup+0x98/0xa0 watchdog_timer_fn+0xlac/0x2d0 hrtimer_rum_queues+0xb0/0x130 hrtimer_interrupt+0x13c/0x3c0 arch_timer_handler_virt+0x3c/0x50 handLe_percpu_devid_irq+0x90/0x1f4 handle domain irq+0x84/0x100 gic_handle_irq+0x88/0x2b0 e11 ira+0xhB/Bx140 scan_swap_map_slots+0x678/0x890 get_swap_pages+0x29c/0x440 get_swap_page+0x120/0x2e0 add_to_swap+UX2U/0XyC shrink_page_list+0x5d0/0x152c shrink_inactive_list+0xl6c/Bx500 shrink_lruvec+0x270/0x304
WARN: soft lockup - CPU#32 stuck for 11s! [stress-ng:309915] watchdog_timer_fn+0x1ac/0x2d0 __run_hrtimer+0x98/0x2a0 __hrtimer_run_queues+0xb0/0x130 hrtimer_interrupt+0x13c/0x3c0 arch_timer_handler_virt+0x3c/0x50 handle_percpu_devid_irq+0x90/0x1f4 __handle_domain_irq+0x84/0x100 gic_handle_irq+0x88/0x2b0 el1_irq+0xb8/0x140 get_swap_pages+0x1e8/0x440 get_swap_page+0x1c8/0x2e0 add_to_swap+0x20/0x9c shrink_page_list+0x5d0/0x152c reclaim_pages+0x160/0x310 madvise_cold_or_pageout_pte_range+0x7bc/0xe3c walk_pmd_range.isra.0+0xac/0x22c walk_pud_range+0xfc/0x1c0 walk_pgd_range+0x158/0x1b0 __walk_page_range+0x64/0x100 walk_page_range+0x104/0x150
Link: https://lkml.kernel.org/r/20221118133850.3360369-1-chenwandun@huawei.com Fixes: 048c27fd7281 ("[PATCH] swap: scan_swap_map latency breaks") Signed-off-by: Chen Wandun chenwandun@huawei.com Reviewed-by: "Huang, Ying" ying.huang@intel.com Cc: Hugh Dickins hugh@veritas.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Nanyong Sun sunnanyong@huawei.com Cc: xialonglong1@huawei.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Conflicts: mm/swapfile.c Signed-off-by: Chen Wandun chenwandun@huawei.com Reviewed-by: Kefeng Wang wangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/swapfile.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c index 7faa30f460e4..f4e45e84061c 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -944,6 +944,11 @@ static int scan_swap_map_slots(struct swap_info_struct *si, scan: spin_unlock(&si->lock); while (++offset <= READ_ONCE(si->highest_bit)) { + if (unlikely(--latency_ration < 0)) { + cond_resched(); + latency_ration = LATENCY_LIMIT; + scanned_many = true; + } if (data_race(!si->swap_map[offset])) { spin_lock(&si->lock); goto checks; @@ -953,14 +958,14 @@ static int scan_swap_map_slots(struct swap_info_struct *si, spin_lock(&si->lock); goto checks; } + } + offset = si->lowest_bit; + while (offset < scan_base) { if (unlikely(--latency_ration < 0)) { cond_resched(); latency_ration = LATENCY_LIMIT; scanned_many = true; } - } - offset = si->lowest_bit; - while (offset < scan_base) { if (data_race(!si->swap_map[offset])) { spin_lock(&si->lock); goto checks; @@ -970,11 +975,6 @@ static int scan_swap_map_slots(struct swap_info_struct *si, spin_lock(&si->lock); goto checks; } - if (unlikely(--latency_ration < 0)) { - cond_resched(); - latency_ration = LATENCY_LIMIT; - scanned_many = true; - } offset++; } spin_lock(&si->lock);
From: Miaohe Lin linmiaohe@huawei.com
mainline inclusion from mainline-v5.14-rc1 commit 63d8620ecf93b5d8d0a254471184d08f8e8f538d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I645JI CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
Patch series "close various race windows for swap", v6.
When I was investigating the swap code, I found some possible race windows. This series aims to fix all these races. But using current get/put_swap_device() to guard against concurrent swapoff for swap_readpage() looks terrible because swap_readpage() may take really long time. And to reduce the performance overhead on the hot-path as much as possible, it appears we can use the percpu_ref to close this race window(as suggested by Huang, Ying). The patch 1 adds percpu_ref support for swap and most of the remaining patches try to use this to close various race windows. More details can be found in the respective changelogs.
This patch (of 4):
Using current get/put_swap_device() to guard against concurrent swapoff for some swap ops, e.g. swap_readpage(), looks terrible because they might take really long time. This patch adds the percpu_ref support to serialize against concurrent swapoff(as suggested by Huang, Ying). Also we remove the SWP_VALID flag because it's used together with RCU solution.
Link: https://lkml.kernel.org/r/20210426123316.806267-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20210426123316.806267-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin linmiaohe@huawei.com Reviewed-by: "Huang, Ying" ying.huang@intel.com Cc: Alex Shi alexs@kernel.org Cc: David Hildenbrand david@redhat.com Cc: Dennis Zhou dennis@kernel.org Cc: Hugh Dickins hughd@google.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Matthew Wilcox willy@infradead.org Cc: Michal Hocko mhocko@suse.com Cc: Minchan Kim minchan@kernel.org Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Wei Yang richard.weiyang@gmail.com Cc: Yang Shi shy828301@gmail.com Cc: Yu Zhao yuzhao@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Chen Wandun chenwandun@huawei.com Reviewed-by: Kefeng Wangwangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/swap.h | 5 +-- mm/swapfile.c | 79 +++++++++++++++++++++++++++----------------- 2 files changed, 52 insertions(+), 32 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h index e0f69151d2af..a65f3283ac05 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -187,7 +187,6 @@ enum { SWP_PAGE_DISCARD = (1 << 10), /* freed swap page-cluster discards */ SWP_STABLE_WRITES = (1 << 11), /* no overwrite PG_writeback pages */ SWP_SYNCHRONOUS_IO = (1 << 12), /* synchronous IO is efficient */ - SWP_VALID = (1 << 13), /* swap is valid to be operated on? */ /* add others here before... */ SWP_SCANNING = (1 << 14), /* refcount in scan_swap_map */ }; @@ -250,6 +249,7 @@ struct swap_cluster_list { * The in-memory structure used to track swap areas. */ struct swap_info_struct { + struct percpu_ref users; /* indicate and keep swap device valid. */ unsigned long flags; /* SWP_USED etc: see above */ signed short prio; /* swap priority of this type */ struct plist_node list; /* entry in swap_active_head */ @@ -270,6 +270,7 @@ struct swap_info_struct { struct block_device *bdev; /* swap device or bdev of swap file */ struct file *swap_file; /* seldom referenced */ unsigned int old_block_size; /* seldom referenced */ + struct completion comp; /* seldom referenced */ #ifdef CONFIG_FRONTSWAP unsigned long *frontswap_map; /* frontswap in-use, one bit per page */ atomic_t frontswap_pages; /* frontswap pages in-use counter */ @@ -535,7 +536,7 @@ sector_t swap_page_sector(struct page *page);
static inline void put_swap_device(struct swap_info_struct *si) { - rcu_read_unlock(); + percpu_ref_put(&si->users); }
#else /* CONFIG_SWAP */ diff --git a/mm/swapfile.c b/mm/swapfile.c index f4e45e84061c..0c560204acf4 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -39,6 +39,7 @@ #include <linux/export.h> #include <linux/swap_slots.h> #include <linux/sort.h> +#include <linux/completion.h>
#include <asm/tlbflush.h> #include <linux/swapops.h> @@ -512,6 +513,14 @@ static void swap_discard_work(struct work_struct *work) spin_unlock(&si->lock); }
+static void swap_users_ref_free(struct percpu_ref *ref) +{ + struct swap_info_struct *si; + + si = container_of(ref, struct swap_info_struct, users); + complete(&si->comp); +} + static void alloc_cluster(struct swap_info_struct *si, unsigned long idx) { struct swap_cluster_info *ci = si->cluster_info; @@ -1274,18 +1283,12 @@ static unsigned char __swap_entry_free_locked(struct swap_info_struct *p, * via preventing the swap device from being swapoff, until * put_swap_device() is called. Otherwise return NULL. * - * The entirety of the RCU read critical section must come before the - * return from or after the call to synchronize_rcu() in - * enable_swap_info() or swapoff(). So if "si->flags & SWP_VALID" is - * true, the si->map, si->cluster_info, etc. must be valid in the - * critical section. - * * Notice that swapoff or swapoff+swapon can still happen before the - * rcu_read_lock() in get_swap_device() or after the rcu_read_unlock() - * in put_swap_device() if there isn't any other way to prevent - * swapoff, such as page lock, page table lock, etc. The caller must - * be prepared for that. For example, the following situation is - * possible. + * percpu_ref_tryget_live() in get_swap_device() or after the + * percpu_ref_put() in put_swap_device() if there isn't any other way + * to prevent swapoff, such as page lock, page table lock, etc. The + * caller must be prepared for that. For example, the following + * situation is possible. * * CPU1 CPU2 * do_swap_page() @@ -1313,21 +1316,27 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry) si = swp_swap_info(entry); if (!si) goto bad_nofile; - - rcu_read_lock(); - if (data_race(!(si->flags & SWP_VALID))) - goto unlock_out; + if (!percpu_ref_tryget_live(&si->users)) + goto out; + /* + * Guarantee the si->users are checked before accessing other + * fields of swap_info_struct. + * + * Paired with the spin_unlock() after setup_swap_info() in + * enable_swap_info(). + */ + smp_rmb(); offset = swp_offset(entry); if (offset >= si->max) - goto unlock_out; + goto put_out;
return si; bad_nofile: pr_err("%s: %s%08lx\n", __func__, Bad_file, entry.val); out: return NULL; -unlock_out: - rcu_read_unlock(); +put_out: + percpu_ref_put(&si->users); return NULL; }
@@ -2500,7 +2509,7 @@ static void setup_swap_info(struct swap_info_struct *p, int prio,
static void _enable_swap_info(struct swap_info_struct *p) { - p->flags |= SWP_WRITEOK | SWP_VALID; + p->flags |= SWP_WRITEOK; atomic_long_add(p->pages, &nr_swap_pages); total_swap_pages += p->pages;
@@ -2531,10 +2540,9 @@ static void enable_swap_info(struct swap_info_struct *p, int prio, spin_unlock(&p->lock); spin_unlock(&swap_lock); /* - * Guarantee swap_map, cluster_info, etc. fields are valid - * between get/put_swap_device() if SWP_VALID bit is set + * Finished initializing swap device, now it's safe to reference it. */ - synchronize_rcu(); + percpu_ref_resurrect(&p->users); spin_lock(&swap_lock); spin_lock(&p->lock); _enable_swap_info(p); @@ -2650,16 +2658,16 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
reenable_swap_slots_cache_unlock();
- spin_lock(&swap_lock); - spin_lock(&p->lock); - p->flags &= ~SWP_VALID; /* mark swap device as invalid */ - spin_unlock(&p->lock); - spin_unlock(&swap_lock); /* - * wait for swap operations protected by get/put_swap_device() - * to complete + * Wait for swap operations protected by get/put_swap_device() + * to complete. + * + * We need synchronize_rcu() here to protect the accessing to + * the swap cache data structure. */ + percpu_ref_kill(&p->users); synchronize_rcu(); + wait_for_completion(&p->comp);
flush_work(&p->discard_work);
@@ -2891,6 +2899,12 @@ static struct swap_info_struct *alloc_swap_info(void) if (!p) return ERR_PTR(-ENOMEM);
+ if (percpu_ref_init(&p->users, swap_users_ref_free, + PERCPU_REF_INIT_DEAD, GFP_KERNEL)) { + kvfree(p); + return ERR_PTR(-ENOMEM); + } + spin_lock(&swap_lock); for (type = 0; type < nr_swapfiles; type++) { if (!(swap_info[type]->flags & SWP_USED)) @@ -2898,6 +2912,7 @@ static struct swap_info_struct *alloc_swap_info(void) } if (type >= MAX_SWAPFILES) { spin_unlock(&swap_lock); + percpu_ref_exit(&p->users); kvfree(p); return ERR_PTR(-EPERM); } @@ -2925,9 +2940,13 @@ static struct swap_info_struct *alloc_swap_info(void) plist_node_init(&p->avail_lists[i], 0); p->flags = SWP_USED; spin_unlock(&swap_lock); - kvfree(defer); + if (defer) { + percpu_ref_exit(&defer->users); + kvfree(defer); + } spin_lock_init(&p->lock); spin_lock_init(&p->cont_lock); + init_completion(&p->comp);
return p; }
From: Chen Wandun chenwandun@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I645JI
-------------------------------
This patch fix kabi problem introduced by commit ("mm/swapfile:use percpu_ref to serialize against concurrent swapoff").
Considering swap_info_struct is referenced by pointer and dont use in third module, so use KABI_EXTEND for two new member variables in swap_info_struct. Besides, dont remove unused enum value SWP_VALID, avoid unnecessary undefined alarms.
Signed-off-by: Chen Wandun chenwandun@huawei.com Reviewed-by: Kefeng Wangwangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/swap.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h index a65f3283ac05..b201a859ed7d 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -187,6 +187,7 @@ enum { SWP_PAGE_DISCARD = (1 << 10), /* freed swap page-cluster discards */ SWP_STABLE_WRITES = (1 << 11), /* no overwrite PG_writeback pages */ SWP_SYNCHRONOUS_IO = (1 << 12), /* synchronous IO is efficient */ + SWP_VALID = (1 << 13), /* swap is valid to be operated on? */ /* add others here before... */ SWP_SCANNING = (1 << 14), /* refcount in scan_swap_map */ }; @@ -249,7 +250,6 @@ struct swap_cluster_list { * The in-memory structure used to track swap areas. */ struct swap_info_struct { - struct percpu_ref users; /* indicate and keep swap device valid. */ unsigned long flags; /* SWP_USED etc: see above */ signed short prio; /* swap priority of this type */ struct plist_node list; /* entry in swap_active_head */ @@ -270,7 +270,6 @@ struct swap_info_struct { struct block_device *bdev; /* swap device or bdev of swap file */ struct file *swap_file; /* seldom referenced */ unsigned int old_block_size; /* seldom referenced */ - struct completion comp; /* seldom referenced */ #ifdef CONFIG_FRONTSWAP unsigned long *frontswap_map; /* frontswap in-use, one bit per page */ atomic_t frontswap_pages; /* frontswap pages in-use counter */ @@ -296,6 +295,8 @@ struct swap_info_struct { struct swap_cluster_list discard_clusters; /* discard clusters list */ KABI_RESERVE(1) KABI_RESERVE(2) + KABI_EXTEND(struct percpu_ref users) /* indicate and keep swap device valid. */ + KABI_EXTEND(struct completion comp) /* seldom referenced */ struct plist_node avail_lists[]; /* * entries in swap_avail_heads, one * entry per node.
From: Miaohe Lin linmiaohe@huawei.com
mainline inclusion from mainline-v5.14-rc1 commit 2799e77529c2a25492a4395db93996e3dacd762d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I645JI CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When I was investigating the swap code, I found the below possible race window:
CPU 1 CPU 2 ----- ----- do_swap_page if (data_race(si->flags & SWP_SYNCHRONOUS_IO) swap_readpage if (data_race(sis->flags & SWP_FS_OPS)) { swapoff .. p->swap_file = NULL; .. struct file *swap_file = sis->swap_file; struct address_space *mapping = swap_file->f_mapping;[oops!]
Note that for the pages that are swapped in through swap cache, this isn't an issue. Because the page is locked, and the swap entry will be marked with SWAP_HAS_CACHE, so swapoff() can not proceed until the page has been unlocked.
Fix this race by using get/put_swap_device() to guard against concurrent swapoff.
Link: https://lkml.kernel.org/r/20210426123316.806267-3-linmiaohe@huawei.com Fixes: 0bcac06f27d7 ("mm,swap: skip swapcache for swapin of synchronous device") Signed-off-by: Miaohe Lin linmiaohe@huawei.com Reviewed-by: "Huang, Ying" ying.huang@intel.com Cc: Alex Shi alexs@kernel.org Cc: David Hildenbrand david@redhat.com Cc: Dennis Zhou dennis@kernel.org Cc: Hugh Dickins hughd@google.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Matthew Wilcox willy@infradead.org Cc: Michal Hocko mhocko@suse.com Cc: Minchan Kim minchan@kernel.org Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Wei Yang richard.weiyang@gmail.com Cc: Yang Shi shy828301@gmail.com Cc: Yu Zhao yuzhao@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Conflicts: mm/memory.c Signed-off-by: Chen Wandun chenwandun@huawei.com Reviewed-by: Kefeng Wangwangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/swap.h | 9 +++++++++ mm/memory.c | 11 +++++++++-- 2 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h index b201a859ed7d..c55f6e410d05 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -552,6 +552,15 @@ static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry) return NULL; }
+static inline struct swap_info_struct *get_swap_device(swp_entry_t entry) +{ + return NULL; +} + +static inline void put_swap_device(struct swap_info_struct *si) +{ +} + #define swap_address_space(entry) (NULL) #define get_nr_swap_pages() 0L #define total_swap_pages 0L diff --git a/mm/memory.c b/mm/memory.c index 0505f9c009c4..badf913062a2 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3366,6 +3366,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct page *page = NULL, *swapcache; + struct swap_info_struct *si = NULL; swp_entry_t entry; pte_t pte; int locked; @@ -3425,14 +3426,16 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) goto out; }
+ /* Prevent swapoff from happening to us. */ + si = get_swap_device(entry); + if (unlikely(!si)) + goto out;
delayacct_set_flag(DELAYACCT_PF_SWAPIN); page = lookup_swap_cache(entry, vma, vmf->address); swapcache = page;
if (!page) { - struct swap_info_struct *si = swp_swap_info(entry); - if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) { /* skip swapcache */ @@ -3604,6 +3607,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) unlock: pte_unmap_unlock(vmf->pte, vmf->ptl); out: + if (si) + put_swap_device(si); return ret; out_nomap: pte_unmap_unlock(vmf->pte, vmf->ptl); @@ -3615,6 +3620,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) unlock_page(swapcache); put_page(swapcache); } + if (si) + put_swap_device(si); return ret; }
From: Miaohe Lin linmiaohe@huawei.com
mainline inclusion from mainline-v5.14-rc1 commit 2efa33fc7f6ec94a3a538c1a264273c889be2b36 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I645JI CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
--------------------------------
When I was investigating the swap code, I found the below possible race window:
CPU 1 CPU 2 ----- ----- shmem_swapin swap_cluster_readahead if (likely(si->flags & (SWP_BLKDEV | SWP_FS_OPS))) { swapoff .. si->swap_file = NULL; .. struct inode *inode = si->swap_file->f_mapping->host;[oops!]
Close this race window by using get/put_swap_device() to guard against concurrent swapoff.
Link: https://lkml.kernel.org/r/20210426123316.806267-5-linmiaohe@huawei.com Fixes: 8fd2e0b505d1 ("mm: swap: check if swap backing device is congested or not") Signed-off-by: Miaohe Lin linmiaohe@huawei.com Reviewed-by: "Huang, Ying" ying.huang@intel.com Cc: Dennis Zhou dennis@kernel.org Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Hugh Dickins hughd@google.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Michal Hocko mhocko@suse.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Cc: Alex Shi alexs@kernel.org Cc: Matthew Wilcox willy@infradead.org Cc: Minchan Kim minchan@kernel.org Cc: Wei Yang richard.weiyang@gmail.com Cc: Yang Shi shy828301@gmail.com Cc: David Hildenbrand david@redhat.com Cc: Yu Zhao yuzhao@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Chen Wandun chenwandun@huawei.com Reviewed-by: Kefeng Wangwangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/shmem.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/mm/shmem.c b/mm/shmem.c index 258179b89a9b..f7caf1dec81c 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1711,7 +1711,8 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index, struct address_space *mapping = inode->i_mapping; struct shmem_inode_info *info = SHMEM_I(inode); struct mm_struct *charge_mm = vma ? vma->vm_mm : current->mm; - struct page *page; + struct swap_info_struct *si; + struct page *page = NULL; swp_entry_t swap; int error;
@@ -1719,6 +1720,12 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index, swap = radix_to_swp_entry(*pagep); *pagep = NULL;
+ /* Prevent swapoff from happening to us. */ + si = get_swap_device(swap); + if (!si) { + error = EINVAL; + goto failed; + } /* Look it up and read it in.. */ page = lookup_swap_cache(swap, NULL, 0); if (!page) { @@ -1780,6 +1787,8 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index, swap_free(swap);
*pagep = page; + if (si) + put_swap_device(si); return 0; failed: if (!shmem_confirm_swap(mapping, index, swap)) @@ -1790,6 +1799,9 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index, put_page(page); }
+ if (si) + put_swap_device(si); + return error; }
From: Liu Shixin liushixin2@huawei.com
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I641XX CVE: NA
--------------------------------
Patch 1378a5ee451a ("mm: store compound_nr as well as compound_order") add a new member compound_nr in struct page, and use this new member insteal of compound_order in hugetlb_cgroup_move_parent() to compute the nr_pages.
In free_hugepage_to_hugetlb(), we reset page->mapping to NULL for each subpage. Since page->mapping and page->compound_nr is union, we reset page->compound_nr too unexpectly. This will finally result the nr_pages incorrect in hugetlb_cgroup_move_parent() and can't release hugetlb_cgroup.
Fix this problem by reset page->compound_nr using set_compound_order().
Signed-off-by: Liu Shixin liushixin2@huawei.com Reviewed-by: Kefeng Wangwangkefeng.wang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- mm/dynamic_hugetlb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/dynamic_hugetlb.c b/mm/dynamic_hugetlb.c index c1b968a7e668..c6629a8de7c0 100644 --- a/mm/dynamic_hugetlb.c +++ b/mm/dynamic_hugetlb.c @@ -799,7 +799,8 @@ static int free_hugepage_to_hugetlb(struct dhugetlb_pool *hpool) p->mapping = NULL; } set_compound_page_dtor(page, HUGETLB_PAGE_DTOR); - + /* compound_nr and mapping are union in page, reset it. */ + set_compound_order(page, PUD_SHIFT - PAGE_SHIFT); nid = page_to_nid(page); SetHPageFreed(page); list_move(&page->lru, &h->hugepage_freelists[nid]);
From: Zheng Zucheng zhengzucheng@huawei.com
hulk inclusion category: bugfix bugzilla: 18808I3, https://gitee.com/openeuler/kernel/issues/I648XI CVE: NA
-------------------------------
If the extended kabi memory is not initialized, maybe has security risks. Therefore, the extended kabi memory is initialized to NULL in fork process and initialized by users as required.
Fixes: 5efc447b7caf ("fork: Allocate a new task_struct_resvd object for fork task") Signed-off-by: Zheng Zucheng zhengzucheng@huawei.com Reviewed-by: Zhang Qiao zhangqiao22@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/fork.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/fork.c b/kernel/fork.c index 72d2834cc4fd..0ac2ae1e8f85 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -856,7 +856,7 @@ void set_task_stack_end_magic(struct task_struct *tsk) static bool dup_resvd_task_struct(struct task_struct *dst, struct task_struct *orig, int node) { - dst->_resvd = kmalloc_node(sizeof(struct task_struct_resvd), + dst->_resvd = kzalloc_node(sizeof(struct task_struct_resvd), GFP_KERNEL, node); if (!dst->_resvd) return false;
From: Andrzej Hajda andrzej.hajda@intel.com
stable inclusion from stable-v5.10.157 commit 86f0082fb9470904b15546726417f28077088fee category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I640L3 CVE: CVE-2022-4139
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v...
--------------------------------
commit 04aa64375f48a5d430b5550d9271f8428883e550 upstream.
In case of Gen12 video and compute engines, TLB_INV registers are masked - to modify one bit, corresponding bit in upper half of the register must be enabled, otherwise nothing happens.
CVE: CVE-2022-4139 Suggested-by: Chris Wilson chris.p.wilson@intel.com Signed-off-by: Andrzej Hajda andrzej.hajda@intel.com Acked-by: Daniel Vetter daniel.vetter@ffwll.ch Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Ren Zhijie renzhijie2@huawei.com Reviewed-by: Zhang Qiao zhangqiao22@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/gpu/drm/i915/gt/intel_gt.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index a33887f2464f..5f86d9aacb8a 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -745,6 +745,10 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) if (!i915_mmio_reg_offset(rb.reg)) continue;
+ if (INTEL_GEN(i915) == 12 && (engine->class == VIDEO_DECODE_CLASS || + engine->class == VIDEO_ENHANCEMENT_CLASS)) + rb.bit = _MASKED_BIT_ENABLE(rb.bit); + intel_uncore_write_fw(uncore, rb.reg, rb.bit); }
From: Junxian Huang huangjunxian6@hisilicon.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63IM5
---------------------------------------------------------------------------
When setting RoCE Bonding, a new hr_dev will be registered as "hns_bond_xx". In the process, the bonding thread will try to acquire rtnl_lock() while holding roce_bond_mutex. However, it's possible that another thread running bond_netdev_notify_work() grabs rtnl_lock() before the bonding thread, and call the bonding notifier function, in which the thread will try to acquire roce_bond_mutex, finally leading to a dead lock.
As the event informer notifier_call_chain() will not call the next notifier function until the current one returns, there is no need to use a mutex in the bonding notifier function. Thus, remove roce_bond_mutex in hns_roce_bond_event() and the dead lock can be avoided.
Fixes: e62a20278f18 ("RDMA/hns: support RoCE bonding") Signed-off-by: Junxian Huang huangjunxian6@hisilicon.com Reviewed-by: Yangyang Li liyangyang20@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/infiniband/hw/hns/hns_roce_bond.c | 31 +++++++++-------------- drivers/infiniband/hw/hns/hns_roce_bond.h | 1 + 2 files changed, 13 insertions(+), 19 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_bond.c b/drivers/infiniband/hw/hns/hns_roce_bond.c index 9c8e817a6123..b97b3f68ffc1 100644 --- a/drivers/infiniband/hw/hns/hns_roce_bond.c +++ b/drivers/infiniband/hw/hns/hns_roce_bond.c @@ -39,7 +39,8 @@ bool hns_roce_bond_is_active(struct hns_roce_dev *hr_dev) for_each_netdev_in_bond_rcu(upper_dev, net_dev) { hr_dev = hns_roce_get_hrdev_by_netdev(net_dev); if (hr_dev && hr_dev->bond_grp && - hr_dev->bond_grp->bond_state == HNS_ROCE_BOND_IS_BONDED) { + (hr_dev->bond_grp->bond_state == HNS_ROCE_BOND_REGISTERING || + hr_dev->bond_grp->bond_state == HNS_ROCE_BOND_IS_BONDED)) { rcu_read_unlock(); return true; } @@ -154,7 +155,6 @@ static void hns_roce_set_bond(struct hns_roce_bond_group *bond_grp) int ret; int i;
- hns_roce_bond_get_active_slave(bond_grp); /* bond_grp will be kfree during uninit_instance of main_hr_dev. * Thus the main_hr_dev is switched before the uninit_instance * of the previous main_hr_dev. @@ -165,7 +165,7 @@ static void hns_roce_set_bond(struct hns_roce_bond_group *bond_grp) hns_roce_bond_uninit_client(bond_grp, i); }
- bond_grp->bond_state = HNS_ROCE_BOND_IS_BONDED; + bond_grp->bond_state = HNS_ROCE_BOND_REGISTERING;
for (i = 0; i < ROCE_BOND_FUNC_MAX; i++) { net_dev = bond_grp->bond_func_info[i].net_dev; @@ -183,17 +183,18 @@ static void hns_roce_set_bond(struct hns_roce_bond_group *bond_grp) } } } - if (!hr_dev) return;
hns_roce_bond_uninit_client(bond_grp, main_func_idx); + hns_roce_bond_get_active_slave(bond_grp); ret = hns_roce_cmd_bond(hr_dev, HNS_ROCE_SET_BOND); if (ret) { ibdev_err(&hr_dev->ib_dev, "failed to set RoCE bond!\n"); return; }
+ bond_grp->bond_state = HNS_ROCE_BOND_IS_BONDED; ibdev_info(&hr_dev->ib_dev, "RoCE set bond finished!\n"); }
@@ -239,7 +240,6 @@ static void hns_roce_slave_changestate(struct hns_roce_bond_group *bond_grp) int ret;
hns_roce_bond_get_active_slave(bond_grp); - bond_grp->bond_state = HNS_ROCE_BOND_IS_BONDED;
ret = hns_roce_cmd_bond(bond_grp->main_hr_dev, HNS_ROCE_CHANGE_BOND); if (ret) { @@ -248,6 +248,7 @@ static void hns_roce_slave_changestate(struct hns_roce_bond_group *bond_grp) return; }
+ bond_grp->bond_state = HNS_ROCE_BOND_IS_BONDED; ibdev_info(&bond_grp->main_hr_dev->ib_dev, "RoCE slave changestate finished!\n"); } @@ -258,8 +259,6 @@ static void hns_roce_slave_inc(struct hns_roce_bond_group *bond_grp) u8 inc_func_idx = 0; int ret;
- hns_roce_bond_get_active_slave(bond_grp); - while (inc_slave_map > 0) { if (inc_slave_map & 1) hns_roce_bond_uninit_client(bond_grp, inc_func_idx); @@ -267,8 +266,7 @@ static void hns_roce_slave_inc(struct hns_roce_bond_group *bond_grp) inc_func_idx++; }
- bond_grp->bond_state = HNS_ROCE_BOND_IS_BONDED; - + hns_roce_bond_get_active_slave(bond_grp); ret = hns_roce_cmd_bond(bond_grp->main_hr_dev, HNS_ROCE_CHANGE_BOND); if (ret) { ibdev_err(&bond_grp->main_hr_dev->ib_dev, @@ -276,6 +274,7 @@ static void hns_roce_slave_inc(struct hns_roce_bond_group *bond_grp) return; }
+ bond_grp->bond_state = HNS_ROCE_BOND_IS_BONDED; ibdev_info(&bond_grp->main_hr_dev->ib_dev, "RoCE slave increase finished!\n"); } @@ -290,8 +289,6 @@ static void hns_roce_slave_dec(struct hns_roce_bond_group *bond_grp) int ret; int i;
- hns_roce_bond_get_active_slave(bond_grp); - bond_grp->bond_state = HNS_ROCE_BOND_IS_BONDED;
main_func_idx = PCI_FUNC(bond_grp->main_hr_dev->pci_dev->devfn); @@ -300,6 +297,7 @@ static void hns_roce_slave_dec(struct hns_roce_bond_group *bond_grp) for (i = 0; i < ROCE_BOND_FUNC_MAX; i++) { net_dev = bond_grp->bond_func_info[i].net_dev; if (!(dec_slave_map & (1 << i)) && net_dev) { + bond_grp->bond_state = HNS_ROCE_BOND_REGISTERING; hr_dev = hns_roce_bond_init_client(bond_grp, i); if (hr_dev) { bond_grp->main_hr_dev = hr_dev; @@ -321,6 +319,7 @@ static void hns_roce_slave_dec(struct hns_roce_bond_group *bond_grp) dec_func_idx++; }
+ hns_roce_bond_get_active_slave(bond_grp); if (bond_grp->slave_map_diff & (1 << main_func_idx)) ret = hns_roce_cmd_bond(hr_dev, HNS_ROCE_SET_BOND); else @@ -332,6 +331,7 @@ static void hns_roce_slave_dec(struct hns_roce_bond_group *bond_grp) return; }
+ bond_grp->bond_state = HNS_ROCE_BOND_IS_BONDED; ibdev_info(&bond_grp->main_hr_dev->ib_dev, "RoCE slave decrease finished!\n"); } @@ -608,27 +608,20 @@ int hns_roce_bond_event(struct notifier_block *self, if (event == NETDEV_CHANGELOWERSTATE && !upper_dev && hr_dev != hns_roce_get_hrdev_by_netdev(net_dev)) return NOTIFY_DONE; - if (upper_dev) { if (!hns_roce_is_slave(upper_dev, hr_dev->iboe.netdevs[0])) return NOTIFY_DONE; - - mutex_lock(&roce_bond_mutex); if (!hr_dev->bond_grp) { - if (hns_roce_is_bond_grp_exist(upper_dev)) { - mutex_unlock(&roce_bond_mutex); + if (hns_roce_is_bond_grp_exist(upper_dev)) return NOTIFY_DONE; - } hr_dev->bond_grp = hns_roce_alloc_bond_grp(hr_dev, upper_dev); if (!hr_dev->bond_grp) { ibdev_err(&hr_dev->ib_dev, "failed to alloc RoCE bond_grp!\n"); - mutex_unlock(&roce_bond_mutex); return NOTIFY_DONE; } } - mutex_unlock(&roce_bond_mutex); }
changed = (event == NETDEV_CHANGEUPPER) ? diff --git a/drivers/infiniband/hw/hns/hns_roce_bond.h b/drivers/infiniband/hw/hns/hns_roce_bond.h index a74122fa2587..a222f43ab124 100644 --- a/drivers/infiniband/hw/hns/hns_roce_bond.h +++ b/drivers/infiniband/hw/hns/hns_roce_bond.h @@ -20,6 +20,7 @@ enum { enum hns_roce_bond_state { HNS_ROCE_BOND_NOT_BONDED, HNS_ROCE_BOND_IS_BONDED, + HNS_ROCE_BOND_REGISTERING, HNS_ROCE_BOND_SLAVE_INC, HNS_ROCE_BOND_SLAVE_DEC, HNS_ROCE_BOND_SLAVE_CHANGESTATE,
From: Junxian Huang huangjunxian6@hisilicon.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63IM5
---------------------------------------------------------------------------
In the existing hns RoCE code, ib_dev->ops.get_netdev is not set, which cause that only one slave will be assigned IP-based GID in RoCE bonding mode 1.
This patch adds hns_roce_get_netdev() and set the function to ib_dev->ops.get_netdev so that IB-Core can assign GID to different net device according to the active slave in mode 1.
Fixes: e62a20278f18 ("RDMA/hns: support RoCE bonding") Signed-off-by: Junxian Huang huangjunxian6@hisilicon.com Reviewed-by: Yangyang Li liyangyang20@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/infiniband/hw/hns/hns_roce_bond.c | 46 ++++++++++++++++------- drivers/infiniband/hw/hns/hns_roce_main.c | 25 ++++++++++++ 2 files changed, 58 insertions(+), 13 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_bond.c b/drivers/infiniband/hw/hns/hns_roce_bond.c index b97b3f68ffc1..8d79ae73c4ce 100644 --- a/drivers/infiniband/hw/hns/hns_roce_bond.c +++ b/drivers/infiniband/hw/hns/hns_roce_bond.c @@ -26,27 +26,43 @@ static struct hns_roce_dev *hns_roce_get_hrdev_by_netdev(struct net_device *net_ return hr_dev; }
-bool hns_roce_bond_is_active(struct hns_roce_dev *hr_dev) +static struct hns_roce_bond_group *hns_roce_get_bond_grp(struct hns_roce_dev *hr_dev) { + struct hns_roce_bond_group *bond_grp = NULL; struct net_device *upper_dev; struct net_device *net_dev;
- if (!netif_is_lag_port(hr_dev->iboe.netdevs[0])) - return false; - rcu_read_lock(); + upper_dev = netdev_master_upper_dev_get_rcu(hr_dev->iboe.netdevs[0]); + for_each_netdev_in_bond_rcu(upper_dev, net_dev) { hr_dev = hns_roce_get_hrdev_by_netdev(net_dev); - if (hr_dev && hr_dev->bond_grp && - (hr_dev->bond_grp->bond_state == HNS_ROCE_BOND_REGISTERING || - hr_dev->bond_grp->bond_state == HNS_ROCE_BOND_IS_BONDED)) { - rcu_read_unlock(); - return true; + if (hr_dev && hr_dev->bond_grp) { + bond_grp = hr_dev->bond_grp; + break; } } + rcu_read_unlock();
+ return bond_grp; +} + +bool hns_roce_bond_is_active(struct hns_roce_dev *hr_dev) +{ + struct hns_roce_bond_group *bond_grp; + + if (!netif_is_lag_port(hr_dev->iboe.netdevs[0])) + return false; + + bond_grp = hns_roce_get_bond_grp(hr_dev); + + if (bond_grp && + (bond_grp->bond_state == HNS_ROCE_BOND_REGISTERING || + bond_grp->bond_state == HNS_ROCE_BOND_IS_BONDED)) + return true; + return false; }
@@ -62,12 +78,15 @@ struct net_device *hns_roce_get_bond_netdev(struct hns_roce_dev *hr_dev) if (!netif_is_lag_port(hr_dev->iboe.netdevs[0])) return NULL;
- if (!bond_grp) - return NULL; + if (!bond_grp) { + bond_grp = hns_roce_get_bond_grp(hr_dev); + if (!bond_grp) + return NULL; + }
mutex_lock(&bond_grp->bond_mutex);
- if (bond_grp->bond_state != HNS_ROCE_BOND_IS_BONDED) + if (bond_grp->bond_state == HNS_ROCE_BOND_NOT_BONDED) goto out;
if (bond_grp->tx_type == NETDEV_LAG_TX_TYPE_ACTIVEBACKUP) { @@ -155,7 +174,8 @@ static void hns_roce_set_bond(struct hns_roce_bond_group *bond_grp) int ret; int i;
- /* bond_grp will be kfree during uninit_instance of main_hr_dev. + /* + * bond_grp will be kfree during uninit_instance of main_hr_dev. * Thus the main_hr_dev is switched before the uninit_instance * of the previous main_hr_dev. */ diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c index cdfcefb1f660..f8fc6c905e39 100644 --- a/drivers/infiniband/hw/hns/hns_roce_main.c +++ b/drivers/infiniband/hw/hns/hns_roce_main.c @@ -47,6 +47,30 @@ #include "hns_roce_dca.h" #include "hns_roce_debugfs.h"
+static struct net_device *hns_roce_get_netdev(struct ib_device *ib_dev, + u8 port_num) +{ + struct hns_roce_dev *hr_dev = to_hr_dev(ib_dev); + struct net_device *ndev; + + if (port_num < 1 || port_num > hr_dev->caps.num_ports) + return NULL; + + ndev = hr_dev->hw->get_bond_netdev(hr_dev); + + rcu_read_lock(); + + if (!ndev) + ndev = hr_dev->iboe.netdevs[port_num - 1]; + + if (ndev) + dev_hold(ndev); + + rcu_read_unlock(); + + return ndev; +} + static int hns_roce_set_mac(struct hns_roce_dev *hr_dev, u32 port, const u8 *addr) { @@ -677,6 +701,7 @@ static const struct ib_device_ops hns_roce_dev_ops = { .disassociate_ucontext = hns_roce_disassociate_ucontext, .get_dma_mr = hns_roce_get_dma_mr, .get_link_layer = hns_roce_get_link_layer, + .get_netdev = hns_roce_get_netdev, .get_port_immutable = hns_roce_port_immutable, .mmap = hns_roce_mmap, .mmap_free = hns_roce_free_mmap,
From: Junxian Huang huangjunxian6@hisilicon.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63IM5
---------------------------------------------------------------------------
Applying this patch, RoCE driver will not set bond when NIC sets bond in a mode not supported for RoCE bonding, with VF slaves or with slaves from different I/O die.
Fixes: e62a20278f18 ("RDMA/hns: support RoCE bonding") Signed-off-by: Junxian Huang huangjunxian6@hisilicon.com Reviewed-by: Yangyang Li liyangyang20@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/infiniband/hw/hns/hns_roce_bond.c | 102 +++++++++++++++++----- drivers/infiniband/hw/hns/hns_roce_bond.h | 10 +++ 2 files changed, 92 insertions(+), 20 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_bond.c b/drivers/infiniband/hw/hns/hns_roce_bond.c index 8d79ae73c4ce..a5ce6d32e74f 100644 --- a/drivers/infiniband/hw/hns/hns_roce_bond.c +++ b/drivers/infiniband/hw/hns/hns_roce_bond.c @@ -513,13 +513,13 @@ static bool hns_roce_bond_upper_event(struct hns_roce_dev *hr_dev, struct netdev_notifier_changeupper_info *info) { struct hns_roce_bond_group *bond_grp = hr_dev->bond_grp; + struct netdev_lag_upper_info *bond_upper_info = NULL; struct net_device *upper_dev = info->upper_dev; - struct netdev_lag_upper_info *bond_upper_info; - u32 pre_slave_map = bond_grp->slave_map; - u8 pre_slave_num = bond_grp->slave_num; bool changed = false; + u32 pre_slave_map; + u8 pre_slave_num;
- if (!upper_dev || !netif_is_lag_master(upper_dev)) + if (!bond_grp || !upper_dev || !netif_is_lag_master(upper_dev)) return false;
if (info->linking) @@ -530,15 +530,11 @@ static bool hns_roce_bond_upper_event(struct hns_roce_dev *hr_dev, if (bond_upper_info) bond_grp->tx_type = bond_upper_info->tx_type;
+ pre_slave_map = bond_grp->slave_map; + pre_slave_num = bond_grp->slave_num; hns_roce_bond_info_record(bond_grp, upper_dev);
bond_grp->bond = netdev_priv(upper_dev); - if (!hns_roce_bond_mode_is_supported(bond_grp->tx_type) || - bond_grp->slave_num <= 1) { - changed = bond_grp->bond_ready; - bond_grp->bond_ready = false; - goto out; - }
if (bond_grp->bond_state == HNS_ROCE_BOND_NOT_BONDED) { bond_grp->bond_ready = true; @@ -553,7 +549,6 @@ static bool hns_roce_bond_upper_event(struct hns_roce_dev *hr_dev, changed = true; }
-out: mutex_unlock(&bond_grp->bond_mutex);
return changed; @@ -610,30 +605,93 @@ static bool hns_roce_is_bond_grp_exist(struct net_device *upper_dev) return false; }
+static enum bond_support_type + check_bond_support(struct hns_roce_dev *hr_dev, + struct netdev_notifier_changeupper_info *info) +{ + struct netdev_lag_upper_info *bond_upper_info = NULL; + struct net_device *upper_dev = info->upper_dev; + bool bond_grp_exist = false; + struct net_device *net_dev; + bool support = true; + u8 slave_num = 0; + int bus_num = -1; + + if (hr_dev->bond_grp || hns_roce_is_bond_grp_exist(upper_dev)) + bond_grp_exist = true; + + if (!info->linking && !bond_grp_exist) + return BOND_NOT_SUPPORT; + + if (info->linking) + bond_upper_info = info->upper_info; + + if (bond_upper_info && + !hns_roce_bond_mode_is_supported(bond_upper_info->tx_type)) + return BOND_NOT_SUPPORT; + + rcu_read_lock(); + for_each_netdev_in_bond_rcu(upper_dev, net_dev) { + hr_dev = hns_roce_get_hrdev_by_netdev(net_dev); + if (hr_dev) { + slave_num++; + if (bus_num == -1) + bus_num = hr_dev->pci_dev->bus->number; + if (hr_dev->is_vf || + bus_num != hr_dev->pci_dev->bus->number) { + support = false; + break; + } + } + } + rcu_read_unlock(); + + if (slave_num <= 1) + support = false; + + if (support) + return BOND_SUPPORT; + + return bond_grp_exist ? BOND_EXISTING_NOT_SUPPORT : BOND_NOT_SUPPORT; +} + int hns_roce_bond_event(struct notifier_block *self, unsigned long event, void *ptr) { struct net_device *net_dev = netdev_notifier_info_to_dev(ptr); struct hns_roce_dev *hr_dev = container_of(self, struct hns_roce_dev, bond_nb); + struct netdev_notifier_changeupper_info *info; + enum bond_support_type support = BOND_SUPPORT; struct net_device *upper_dev; bool changed;
if (event != NETDEV_CHANGEUPPER && event != NETDEV_CHANGELOWERSTATE) return NOTIFY_DONE;
- rcu_read_lock(); - upper_dev = netdev_master_upper_dev_get_rcu(net_dev); - rcu_read_unlock(); - if (event == NETDEV_CHANGELOWERSTATE && !upper_dev && - hr_dev != hns_roce_get_hrdev_by_netdev(net_dev)) - return NOTIFY_DONE; - if (upper_dev) { + if (event == NETDEV_CHANGEUPPER) { + info = ptr; + support = check_bond_support(hr_dev, info); + if (support == BOND_NOT_SUPPORT) + return NOTIFY_DONE; + upper_dev = info->upper_dev; + } else { + rcu_read_lock(); + upper_dev = netdev_master_upper_dev_get_rcu(net_dev); + rcu_read_unlock(); + if (!upper_dev && + hr_dev != hns_roce_get_hrdev_by_netdev(net_dev)) + return NOTIFY_DONE; + } + + if (event == NETDEV_CHANGEUPPER) { if (!hns_roce_is_slave(upper_dev, hr_dev->iboe.netdevs[0])) return NOTIFY_DONE; + if (!hr_dev->bond_grp) { if (hns_roce_is_bond_grp_exist(upper_dev)) return NOTIFY_DONE; + hr_dev->bond_grp = hns_roce_alloc_bond_grp(hr_dev, upper_dev); if (!hr_dev->bond_grp) { @@ -642,12 +700,16 @@ int hns_roce_bond_event(struct notifier_block *self, return NOTIFY_DONE; } } - }
+ if (support == BOND_EXISTING_NOT_SUPPORT) { + hr_dev->bond_grp->bond_ready = false; + hns_roce_queue_bond_work(hr_dev, HZ); + return NOTIFY_DONE; + } + } changed = (event == NETDEV_CHANGEUPPER) ? hns_roce_bond_upper_event(hr_dev, ptr) : hns_roce_bond_lowerstate_event(hr_dev, ptr); - if (changed) hns_roce_queue_bond_work(hr_dev, HZ);
diff --git a/drivers/infiniband/hw/hns/hns_roce_bond.h b/drivers/infiniband/hw/hns/hns_roce_bond.h index a222f43ab124..3c5a8a770f6c 100644 --- a/drivers/infiniband/hw/hns/hns_roce_bond.h +++ b/drivers/infiniband/hw/hns/hns_roce_bond.h @@ -17,6 +17,16 @@ enum { BOND_MODE_2_4, };
+enum bond_support_type { + BOND_NOT_SUPPORT, + /* + * bond_grp already exists, but in the current + * conditions it's no longer supported + */ + BOND_EXISTING_NOT_SUPPORT, + BOND_SUPPORT, +}; + enum hns_roce_bond_state { HNS_ROCE_BOND_NOT_BONDED, HNS_ROCE_BOND_IS_BONDED,
From: Junxian Huang huangjunxian6@hisilicon.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63IM5
---------------------------------------------------------------------------
This patch deletes some used variables, encapsulates repeated codes in a new function get_upper_dev_from_ndev and adjusts the structure of hns_roce_bond_event to make the logic clearer.
Fixes: e62a20278f18 ("RDMA/hns: support RoCE bonding") Signed-off-by: Junxian Huang huangjunxian6@hisilicon.com Reviewed-by: Yangyang Li liyangyang20@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/infiniband/hw/hns/hns_roce_bond.c | 48 +++++++++++------------ 1 file changed, 23 insertions(+), 25 deletions(-)
diff --git a/drivers/infiniband/hw/hns/hns_roce_bond.c b/drivers/infiniband/hw/hns/hns_roce_bond.c index a5ce6d32e74f..dcaa329fb989 100644 --- a/drivers/infiniband/hw/hns/hns_roce_bond.c +++ b/drivers/infiniband/hw/hns/hns_roce_bond.c @@ -575,8 +575,7 @@ static struct hns_roce_bond_group *hns_roce_alloc_bond_grp(struct hns_roce_dev * return bond_grp; }
-static bool hns_roce_is_slave(struct net_device *bond, - struct net_device *net_dev) +static struct net_device *get_upper_dev_from_ndev(struct net_device *net_dev) { struct net_device *upper_dev;
@@ -584,7 +583,14 @@ static bool hns_roce_is_slave(struct net_device *bond, upper_dev = netdev_master_upper_dev_get_rcu(net_dev); rcu_read_unlock();
- return bond == upper_dev; + return upper_dev; +} + +static bool hns_roce_is_slave(struct net_device *upper_dev, + struct hns_roce_dev *hr_dev) +{ + return (hr_dev->bond_grp && upper_dev == hr_dev->bond_grp->upper_dev) || + upper_dev == get_upper_dev_from_ndev(hr_dev->iboe.netdevs[0]); }
static bool hns_roce_is_bond_grp_exist(struct net_device *upper_dev) @@ -607,17 +613,18 @@ static bool hns_roce_is_bond_grp_exist(struct net_device *upper_dev)
static enum bond_support_type check_bond_support(struct hns_roce_dev *hr_dev, + struct net_device **upper_dev, struct netdev_notifier_changeupper_info *info) { struct netdev_lag_upper_info *bond_upper_info = NULL; - struct net_device *upper_dev = info->upper_dev; bool bond_grp_exist = false; struct net_device *net_dev; bool support = true; u8 slave_num = 0; int bus_num = -1;
- if (hr_dev->bond_grp || hns_roce_is_bond_grp_exist(upper_dev)) + *upper_dev = info->upper_dev; + if (hr_dev->bond_grp || hns_roce_is_bond_grp_exist(*upper_dev)) bond_grp_exist = true;
if (!info->linking && !bond_grp_exist) @@ -631,7 +638,7 @@ static enum bond_support_type return BOND_NOT_SUPPORT;
rcu_read_lock(); - for_each_netdev_in_bond_rcu(upper_dev, net_dev) { + for_each_netdev_in_bond_rcu(*upper_dev, net_dev) { hr_dev = hns_roce_get_hrdev_by_netdev(net_dev); if (hr_dev) { slave_num++; @@ -648,7 +655,6 @@ static enum bond_support_type
if (slave_num <= 1) support = false; - if (support) return BOND_SUPPORT;
@@ -661,7 +667,6 @@ int hns_roce_bond_event(struct notifier_block *self, struct net_device *net_dev = netdev_notifier_info_to_dev(ptr); struct hns_roce_dev *hr_dev = container_of(self, struct hns_roce_dev, bond_nb); - struct netdev_notifier_changeupper_info *info; enum bond_support_type support = BOND_SUPPORT; struct net_device *upper_dev; bool changed; @@ -670,28 +675,22 @@ int hns_roce_bond_event(struct notifier_block *self, return NOTIFY_DONE;
if (event == NETDEV_CHANGEUPPER) { - info = ptr; - support = check_bond_support(hr_dev, info); + support = check_bond_support(hr_dev, &upper_dev, ptr); if (support == BOND_NOT_SUPPORT) return NOTIFY_DONE; - upper_dev = info->upper_dev; } else { - rcu_read_lock(); - upper_dev = netdev_master_upper_dev_get_rcu(net_dev); - rcu_read_unlock(); - if (!upper_dev && - hr_dev != hns_roce_get_hrdev_by_netdev(net_dev)) - return NOTIFY_DONE; + upper_dev = get_upper_dev_from_ndev(net_dev); }
- if (event == NETDEV_CHANGEUPPER) { - if (!hns_roce_is_slave(upper_dev, hr_dev->iboe.netdevs[0])) - return NOTIFY_DONE; + if (upper_dev && !hns_roce_is_slave(upper_dev, hr_dev)) + return NOTIFY_DONE; + else if (!upper_dev && hr_dev != hns_roce_get_hrdev_by_netdev(net_dev)) + return NOTIFY_DONE;
+ if (event == NETDEV_CHANGEUPPER) { if (!hr_dev->bond_grp) { if (hns_roce_is_bond_grp_exist(upper_dev)) return NOTIFY_DONE; - hr_dev->bond_grp = hns_roce_alloc_bond_grp(hr_dev, upper_dev); if (!hr_dev->bond_grp) { @@ -700,16 +699,15 @@ int hns_roce_bond_event(struct notifier_block *self, return NOTIFY_DONE; } } - if (support == BOND_EXISTING_NOT_SUPPORT) { hr_dev->bond_grp->bond_ready = false; hns_roce_queue_bond_work(hr_dev, HZ); return NOTIFY_DONE; } + changed = hns_roce_bond_upper_event(hr_dev, ptr); + } else { + changed = hns_roce_bond_lowerstate_event(hr_dev, ptr); } - changed = (event == NETDEV_CHANGEUPPER) ? - hns_roce_bond_upper_event(hr_dev, ptr) : - hns_roce_bond_lowerstate_event(hr_dev, ptr); if (changed) hns_roce_queue_bond_work(hr_dev, HZ);
From: Yang Shen shenyang39@huawei.com
mainline inclusion from mainline-v6.2-rc1 commit 5fefef85b0d3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5YCYK CVE: NA
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git/commit/?...
--------------------------------------------------------------------------
cpuhp_state_add_instance() and cpuhp_state_remove_instance() should be used in pairs. Or there will lead to the warn on cpuhp_remove_multi_state() since the cpuhp_step list is not empty.
The following is the error log with 'rmmod coresight-trbe': Error: Removing state 215 which has instances left. Call trace: __cpuhp_remove_state_cpuslocked+0x144/0x160 __cpuhp_remove_state+0xac/0x100 arm_trbe_device_remove+0x2c/0x60 [coresight_trbe] platform_remove+0x34/0x70 device_remove+0x54/0x90 device_release_driver_internal+0x1e4/0x250 driver_detach+0x5c/0xb0 bus_remove_driver+0x64/0xc0 driver_unregister+0x3c/0x70 platform_driver_unregister+0x20/0x30 arm_trbe_exit+0x1c/0x658 [coresight_trbe] __arm64_sys_delete_module+0x1ac/0x24c invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0x58/0x1a0 do_el0_svc+0x38/0xd0 el0_svc+0x2c/0xc0 el0t_64_sync_handler+0x1ac/0x1b0 el0t_64_sync+0x19c/0x1a0 ---[ end trace 0000000000000000 ]---
Fixes: 3fbf7f011f24 ("coresight: sink: Add TRBE driver") Signed-off-by: Yang Shen shenyang39@huawei.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Link: https://lore.kernel.org/r/20221122090355.23533-1-shenyang39@huawei.com Signed-off-by: Junhao He hejunhao3@huawei.com Reviewed-by: Xiongfeng Wang wangxiongfeng2@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/hwtracing/coresight/coresight-trbe.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 277dfae76931..a84f4a51e08a 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -1094,6 +1094,7 @@ static int arm_trbe_probe_cpuhp(struct trbe_drvdata *drvdata)
static void arm_trbe_remove_cpuhp(struct trbe_drvdata *drvdata) { + cpuhp_state_remove_instance(drvdata->trbe_online, &drvdata->hotplug_node); cpuhp_remove_multi_state(drvdata->trbe_online); }
From: Wangming Shao shaowangming@h-partners.com
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63WFC
-------------------------------------------------------------------
The type error caused by repeated definition of the def_domain_type macro is fixed.
Signed-off-by: Wangming Shao shaowangming@h-partners.com Reviewed-by: Hanjun Guo guohanjun@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 54 ++++++++++----------- 1 file changed, 26 insertions(+), 28 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 05be3a76e10f..3f6d25bf0587 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3662,29 +3662,6 @@ static int arm_smmu_aux_get_pasid(struct iommu_domain *domain, struct device *de return smmu_domain->ssid ?: -EINVAL; }
-#ifdef CONFIG_SMMU_BYPASS_DEV -static int arm_smmu_device_domain_type(struct device *dev) -{ - int i; - struct pci_dev *pdev; - - if (!dev_is_pci(dev)) - return 0; - - pdev = to_pci_dev(dev); - for (i = 0; i < smmu_bypass_devices_num; i++) { - if ((smmu_bypass_devices[i].vendor == pdev->vendor) && - (smmu_bypass_devices[i].device == pdev->device)) { - dev_info(dev, "device 0x%hx:0x%hx uses identity mapping.", - pdev->vendor, pdev->device); - return IOMMU_DOMAIN_IDENTITY; - } - } - - return 0; -} -#endif - static int arm_smmu_set_mpam(struct arm_smmu_device *smmu, int sid, int ssid, int partid, int pmg, int s1mpam) { @@ -3933,16 +3910,40 @@ static int arm_smmu_device_set_config(struct device *dev, int type, void *data) #define IS_HISI_PTT_DEVICE(pdev) ((pdev)->vendor == PCI_VENDOR_ID_HUAWEI && \ (pdev)->device == 0xa12e)
+#ifdef CONFIG_SMMU_BYPASS_DEV +static int arm_smmu_bypass_dev_domain_type(struct device *dev) +{ + int i; + struct pci_dev *pdev = to_pci_dev(dev); + + for (i = 0; i < smmu_bypass_devices_num; i++) { + if ((smmu_bypass_devices[i].vendor == pdev->vendor) && + (smmu_bypass_devices[i].device == pdev->device)) { + dev_info(dev, "device 0x%hx:0x%hx uses identity mapping.", + pdev->vendor, pdev->device); + return IOMMU_DOMAIN_IDENTITY; + } + } + + return 0; +} +#endif + static int arm_smmu_def_domain_type(struct device *dev) { + int ret = 0; + if (dev_is_pci(dev)) { struct pci_dev *pdev = to_pci_dev(dev);
if (IS_HISI_PTT_DEVICE(pdev)) return IOMMU_DOMAIN_IDENTITY; + #ifdef CONFIG_SMMU_BYPASS_DEV + ret = arm_smmu_bypass_dev_domain_type(dev); + #endif }
- return 0; + return ret; }
static struct iommu_ops arm_smmu_ops = { @@ -3979,9 +3980,6 @@ static struct iommu_ops arm_smmu_ops = { .aux_attach_dev = arm_smmu_aux_attach_dev, .aux_detach_dev = arm_smmu_aux_detach_dev, .aux_get_pasid = arm_smmu_aux_get_pasid, -#ifdef CONFIG_SMMU_BYPASS_DEV - .def_domain_type = arm_smmu_device_domain_type, -#endif .dev_get_config = arm_smmu_device_get_config, .dev_set_config = arm_smmu_device_set_config, .pgsize_bitmap = -1UL, /* Restricted during device attach */ @@ -4167,7 +4165,7 @@ static int arm_smmu_prepare_init_l2_strtab(struct device *dev, void *data) struct pci_dev *pdev; struct arm_smmu_device *smmu = (struct arm_smmu_device *)data;
- if (!arm_smmu_device_domain_type(dev)) + if (!arm_smmu_def_domain_type(dev)) return 0;
pdev = to_pci_dev(dev);
From: Li Lingfeng lilingfeng3@huawei.com
hulk inclusion category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/I65DCK CVE: NA
-------------------------------
This reverts commit 372370dcdfb21ffc2bc99ba431841c9fd627159d.
There's no evidence that buffer_uptodate and set_buffer_uptodate are unreliable without barriers. What's more, this patch result in the performance deterioration.
Signed-off-by: Li Lingfeng lilingfeng3@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Reviewed-by: Zhang Yi yi.zhang@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- include/linux/buffer_head.h | 25 +------------------------ 1 file changed, 1 insertion(+), 24 deletions(-)
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 20a2ff1c07a1..6b47f94378c5 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -117,6 +117,7 @@ static __always_inline int test_clear_buffer_##name(struct buffer_head *bh) \ * of the form "mark_buffer_foo()". These are higher-level functions which * do something in addition to setting a b_state bit. */ +BUFFER_FNS(Uptodate, uptodate) BUFFER_FNS(Dirty, dirty) TAS_BUFFER_FNS(Dirty, dirty) BUFFER_FNS(Lock, locked) @@ -134,30 +135,6 @@ BUFFER_FNS(Meta, meta) BUFFER_FNS(Prio, prio) BUFFER_FNS(Defer_Completion, defer_completion)
-static __always_inline void set_buffer_uptodate(struct buffer_head *bh) -{ - /* - * make it consistent with folio_mark_uptodate - * pairs with smp_load_acquire in buffer_uptodate - */ - smp_mb__before_atomic(); - set_bit(BH_Uptodate, &bh->b_state); -} - -static __always_inline void clear_buffer_uptodate(struct buffer_head *bh) -{ - clear_bit(BH_Uptodate, &bh->b_state); -} - -static __always_inline int buffer_uptodate(const struct buffer_head *bh) -{ - /* - * make it consistent with folio_test_uptodate - * pairs with smp_mb__before_atomic in set_buffer_uptodate - */ - return (smp_load_acquire(&bh->b_state) & (1UL << BH_Uptodate)) != 0; -} - #define bh_offset(bh) ((unsigned long)(bh)->b_data & ~PAGE_MASK)
/* If we *know* page->private refers to buffer_heads */
From: Jialin Zhang zhangjialin11@huawei.com
hulk inclusion category: performance bugzilla: 32059, https://gitee.com/openeuler/kernel/issues/I65DOZ CVE: NA
--------------------------------
This option optimizes the scheduler for common desktop workloads by automatically creating and populating task groups. This separation of workloads isolates aggressive CPU burners (like build jobs) from desktop applications. Task group autogeneration is currently based upon task session.
We do not need this for mostly server workloads, so just disable by default. If you need this feature really, just enable it by sysctl:
sysctl -w kernel.sched_autogroup_enabled=1
Signed-off-by: Jialin Zhang zhangjialin11@huawei.com Reviewed-by: Xie XiuQi xiexiuqi@huawei.com Signed-off-by: Zheng Zengkai zhengzengkai@huawei.com --- kernel/sched/autogroup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/autogroup.c b/kernel/sched/autogroup.c index 2067080bb235..bcb2bb80919a 100644 --- a/kernel/sched/autogroup.c +++ b/kernel/sched/autogroup.c @@ -5,7 +5,7 @@ #include <linux/nospec.h> #include "sched.h"
-unsigned int __read_mostly sysctl_sched_autogroup_enabled = 1; +unsigned int __read_mostly sysctl_sched_autogroup_enabled; static struct autogroup autogroup_default; static atomic_t autogroup_seq_nr;