May 2024 - Kernel - mailweb.openeuler.org

[PATCH OLK-6.6] sched/fair: Optimize performance by simplifying cpu_util_without()
by Li Zetao 07 May '24

07 May '24

hulk inclusion category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/I9MSNE?from=project-issue CVE: NA -------------------------------- Considering that the high-frequency function cpu_util without is only called when waking up or creating for the first time, in this scenario, the performance can be optimized by simplifying the function. Here are the detailed test results of unixbench. Command: ./Run -c 1 -i 3 Without Patch ------------------------------------------------------------------------ System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 41898849.1 3590.3 Double-Precision Whetstone 55.0 4426.3 804.8 Execl Throughput 43.0 2828.8 657.9 File Copy 1024 bufsize 2000 maxblocks 3960.0 837180.0 2114.1 File Copy 256 bufsize 500 maxblocks 1655.0 256669.0 1550.9 File Copy 4096 bufsize 8000 maxblocks 5800.0 2264169.0 3903.7 Pipe Throughput 12440.0 1101364.7 885.3 Pipe-based Context Switching 4000.0 136573.4 341.4 Process Creation 126.0 6031.7 478.7 Shell Scripts (1 concurrent) 42.4 5875.9 1385.8 Shell Scripts (8 concurrent) 6.0 2567.1 4278.5 System Call Overhead 15000.0 1065481.3 710.3 ======== System Benchmarks Index Score 1252.0 With Patch ------------------------------------------------------------------------ System Benchmarks Index Values BASELINE RESULT INDEX Dhrystone 2 using register variables 116700.0 41832459.9 3584.6 Double-Precision Whetstone 55.0 4426.6 804.8 Execl Throughput 43.0 2675.8 622.3 File Copy 1024 bufsize 2000 maxblocks 3960.0 919862.0 2322.9 File Copy 256 bufsize 500 maxblocks 1655.0 274966.0 1661.4 File Copy 4096 bufsize 8000 maxblocks 5800.0 2350539.0 4052.7 Pipe Throughput 12440.0 1182284.3 950.4 Pipe-based Context Switching 4000.0 155034.4 387.6 Process Creation 126.0 6371.9 505.7 Shell Scripts (1 concurrent) 42.4 5797.9 1367.4 Shell Scripts (8 concurrent) 6.0 2576.7 4294.4 System Call Overhead 15000.0 1128173.1 752.1 ======== System Benchmarks Index Score 1299.1 After lmbench test, we can get 0% ~ 6% performance improvement for lmbench fork_proc/exec_proc/shell_proc. The test results are as follows: base base+this patch fork_proc 457ms 427ms (6.5%) exec_proc 2008ms 1991ms (0.8%) shell_proc 3062ms 2985ms (0.2%) Signed-off-by: Zhang Qiao <zhangqiao22(a)huawei.com> Signed-off-by: Li Zetao <lizetao1(a)huawei.com> --- kernel/sched/fair.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7b0cb2f090da..010dbf2047e5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8530,13 +8530,23 @@ unsigned long cpu_util_cfs_boost(int cpu) * utilization of the specified task, whenever the task is currently * contributing to the CPU utilization. */ -static unsigned long cpu_util_without(int cpu, struct task_struct *p) +static inline unsigned long cpu_util_without(int cpu, struct task_struct *p) { - /* Task has no contribution or is new */ - if (cpu != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time)) - p = NULL; + struct cfs_rq *cfs_rq = &cpu_rq(cpu)->cfs; + unsigned long util = READ_ONCE(cfs_rq->avg.util_avg); + /* + * If @dst_cpu is -1 or @p migrates from @cpu to @dst_cpu remove its + * contribution. If @p migrates from another CPU to @cpu add its + * contribution. In all the other cases @cpu is not impacted by the + * migration so its util_avg is already correct. + */ + if (sched_feat(UTIL_EST)) { + unsigned long util_est; + util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued); + util = max(util, util_est); + } - return cpu_util(cpu, p, -1, 0); + return min(util, capacity_orig_of(cpu)); } /* -- 2.34.1

2 1

[PATCH OLK-6.6] maple_tree: avoid checking other gaps after getting the largest gap
by Li Zetao 07 May '24

07 May '24

From: Peng Zhang <zhangpeng.00(a)bytedance.com> mainline inclusion from mainline-v6.8-rc1 commit 7e552dcd803f4ff60165271c573ab2e38d15769f category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/I9N4V1 CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… ---------------------------------------------------------------------- The last range stored in maple tree is typically quite large. By checking if it exceeds the sum of the remaining ranges in that node, it is possible to avoid checking all other gaps. Running the maple tree test suite in user mode almost always results in a near 100% hit rate for this optimization. Link: https://lkml.kernel.org/r/20231215074632.82045-1-zhangpeng.00@bytedance.com Signed-off-by: Peng Zhang <zhangpeng.00(a)bytedance.com> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Li Zetao <lizetao1(a)huawei.com> --- lib/maple_tree.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/lib/maple_tree.c b/lib/maple_tree.c index 98b3ded67d06..da227397e4d8 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -1547,6 +1547,9 @@ static unsigned long mas_leaf_max_gap(struct ma_state *mas) gap = ULONG_MAX - pivots[max_piv]; if (gap > max_gap) max_gap = gap; + + if (max_gap > pivots[max_piv] - mas->min) + return max_gap; } for (; i <= max_piv; i++) { -- 2.34.1

2 1

[PATCH openEuler-1.0-LTS 0/1] cpufreq: brcmstb-avs-cpufreq: add check for cpufreq_cpu_get's return value
by Liao Chen 07 May '24

07 May '24

fix CVE-2024-27051 Anastasia Belova (1): cpufreq: brcmstb-avs-cpufreq: add check for cpufreq_cpu_get's return value drivers/cpufreq/brcmstb-avs-cpufreq.c | 2 ++ 1 file changed, 2 insertions(+) -- 2.34.1

2 2

[PATCH OLK-5.10] pstore: inode: Only d_invalidate() is needed
by Yang Yingliang 07 May '24

07 May '24

From: Kees Cook <keescook(a)chromium.org> mainline inclusion from mainline-v6.9-rc1 commit a43e0fc5e9134a46515de2f2f8d4100b74e50de3 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9L9IG CVE: CVE-2024-27389 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Unloading a modular pstore backend with records in pstorefs would trigger the dput() double-drop warning: WARNING: CPU: 0 PID: 2569 at fs/dcache.c:762 dput.part.0+0x3f3/0x410 Using the combo of d_drop()/dput() (as mentioned in Documentation/filesystems/vfs.rst) isn't the right approach here, and leads to the reference counting problem seen above. Use d_invalidate() and update the code to not bother checking for error codes that can never happen. Suggested-by: Alexander Viro <viro(a)zeniv.linux.org.uk> Fixes: 609e28bb139e ("pstore: Remove filesystem records when backend is unregistered") Signed-off-by: Kees Cook <keescook(a)chromium.org> Conflicts: fs/pstore/inode.c [yyl: adjust context] Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- fs/pstore/inode.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c index bbf241a431f2..e83f9df43e64 100644 --- a/fs/pstore/inode.c +++ b/fs/pstore/inode.c @@ -312,7 +312,6 @@ int pstore_put_backend_records(struct pstore_info *psi) { struct pstore_private *pos, *tmp; struct dentry *root; - int rc = 0; root = psinfo_lock_root(); if (!root) @@ -322,11 +321,8 @@ int pstore_put_backend_records(struct pstore_info *psi) list_for_each_entry_safe(pos, tmp, &records_list, list) { if (pos->record->psi == psi) { list_del_init(&pos->list); - rc = simple_unlink(d_inode(root), pos->dentry); - if (WARN_ON(rc)) - break; - d_drop(pos->dentry); - dput(pos->dentry); + d_invalidate(pos->dentry); + simple_unlink(d_inode(root), pos->dentry); pos->dentry = NULL; } } @@ -334,7 +330,7 @@ int pstore_put_backend_records(struct pstore_info *psi) inode_unlock(d_inode(root)); - return rc; + return 0; } /* -- 2.25.1

2 1

[PATCH OLK-5.10] f2fs: compress: fix to guarantee persisting compressed blocks by CP
by Zizhi Wo 07 May '24

07 May '24

From: Chao Yu <chao(a)kernel.org> stable inclusion from stable-v6.1.83 commit e54cce8137258a550b49cae45d09e024821fb28d category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9L9OE CVE: CVE-2024-27035 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- [ Upstream commit 8a430dd49e9cb021372b0ad91e60aeef9c6ced00 ] If data block in compressed cluster is not persisted with metadata during checkpoint, after SPOR, the data may be corrupted, let's guarantee to write compressed page by checkpoint. Fixes: 4c8ff7095bef ("f2fs: support data compression") Reviewed-by: Daeho Jeong <daehojeong(a)google.com> Signed-off-by: Chao Yu <chao(a)kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Conflicts: fs/f2fs/data.c fs/f2fs/f2fs.h [Due to the large number of patch modifications and conflicts, patch adaptation is performed in the original contexts, related to f2fs compression] Signed-off-by: Zizhi Wo <wozizhi(a)huawei.com> --- fs/f2fs/compress.c | 4 +++- fs/f2fs/data.c | 18 +++++++++--------- fs/f2fs/f2fs.h | 4 +++- 3 files changed, 15 insertions(+), 11 deletions(-) diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c index a94e102d1586..c99f1cc5c449 100644 --- a/fs/f2fs/compress.c +++ b/fs/f2fs/compress.c @@ -1333,6 +1333,8 @@ void f2fs_compress_write_end_io(struct bio *bio, struct page *page) struct f2fs_sb_info *sbi = bio->bi_private; struct compress_io_ctx *cic = (struct compress_io_ctx *)page_private(page); + enum count_type type = WB_DATA_TYPE(page, + f2fs_is_compressed_page(page)); int i; if (unlikely(bio->bi_status)) @@ -1340,7 +1342,7 @@ void f2fs_compress_write_end_io(struct bio *bio, struct page *page) f2fs_compress_free_page(page); - dec_page_count(sbi, F2FS_WB_DATA); + dec_page_count(sbi, type); if (atomic_dec_return(&cic->pending_pages)) return; diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 28beb7d7c7e1..59024a165d4b 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -71,7 +71,7 @@ struct bio *f2fs_bio_alloc(struct f2fs_sb_info *sbi, int npages, bool noio) return __f2fs_bio_alloc(GFP_KERNEL, npages); } -static bool __is_cp_guaranteed(struct page *page) +bool f2fs_is_cp_guaranteed(struct page *page) { struct address_space *mapping = page->mapping; struct inode *inode; @@ -80,9 +80,6 @@ static bool __is_cp_guaranteed(struct page *page) if (!mapping) return false; - if (f2fs_is_compressed_page(page)) - return false; - inode = mapping->host; sbi = F2FS_I_SB(inode); @@ -347,7 +344,7 @@ static void f2fs_write_end_io(struct bio *bio) bio_for_each_segment_all(bvec, bio, iter_all) { struct page *page = bvec->bv_page; - enum count_type type = WB_DATA_TYPE(page); + enum count_type type = WB_DATA_TYPE(page, false); if (IS_DUMMY_WRITTEN_PAGE(page)) { set_page_private(page, (unsigned long)NULL); @@ -727,7 +724,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio) bio_set_op_attrs(bio, fio->op, fio->op_flags); inc_page_count(fio->sbi, is_read_io(fio->op) ? - __read_io_type(page): WB_DATA_TYPE(fio->page)); + __read_io_type(page) : WB_DATA_TYPE(fio->page, false)); __submit_bio(fio->sbi, bio, fio->type); return 0; @@ -933,7 +930,7 @@ int f2fs_merge_page_bio(struct f2fs_io_info *fio) if (fio->io_wbc) wbc_account_cgroup_owner(fio->io_wbc, fio->page, PAGE_SIZE); - inc_page_count(fio->sbi, WB_DATA_TYPE(page)); + inc_page_count(fio->sbi, WB_DATA_TYPE(page, false)); *fio->last_block = fio->new_blkaddr; *fio->bio = bio; @@ -947,6 +944,7 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio) enum page_type btype = PAGE_TYPE_OF_BIO(fio->type); struct f2fs_bio_info *io = sbi->write_io[btype] + fio->temp; struct page *bio_page; + enum count_type type; f2fs_bug_on(sbi, is_read_io(fio->op)); @@ -976,7 +974,8 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio) /* set submitted = true as a return value */ fio->submitted = true; - inc_page_count(sbi, WB_DATA_TYPE(bio_page)); + type = WB_DATA_TYPE(bio_page, fio->compressed_page); + inc_page_count(sbi, type); if (io->bio && (!io_is_mergeable(sbi, io->bio, io, fio, io->last_block_in_bio, @@ -989,7 +988,8 @@ void f2fs_submit_page_write(struct f2fs_io_info *fio) if (F2FS_IO_ALIGNED(sbi) && (fio->type == DATA || fio->type == NODE) && fio->new_blkaddr & F2FS_IO_SIZE_MASK(sbi)) { - dec_page_count(sbi, WB_DATA_TYPE(bio_page)); + dec_page_count(sbi, WB_DATA_TYPE(bio_page + fio->compressed_page)); fio->retry = true; goto skip; } diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index a174736931d1..154d6fa2f4c0 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -994,7 +994,8 @@ struct f2fs_sm_info { * f2fs monitors the number of several block types such as on-writeback, * dirty dentry blocks, dirty node blocks, and dirty meta blocks. */ -#define WB_DATA_TYPE(p) (__is_cp_guaranteed(p) ? F2FS_WB_CP_DATA : F2FS_WB_DATA) +#define WB_DATA_TYPE(p, f) \ + (f || f2fs_is_cp_guaranteed(p) ? F2FS_WB_CP_DATA : F2FS_WB_DATA) enum count_type { F2FS_DIRTY_DENTS, F2FS_DIRTY_DATA, @@ -3415,6 +3416,7 @@ void f2fs_destroy_checkpoint_caches(void); */ int __init f2fs_init_bioset(void); void f2fs_destroy_bioset(void); +bool f2fs_is_cp_guaranteed(struct page *page); struct bio *f2fs_bio_alloc(struct f2fs_sb_info *sbi, int npages, bool noio); int f2fs_init_bio_entry_cache(void); void f2fs_destroy_bio_entry_cache(void); -- 2.39.2

2 1

[PATCH openEuler-1.0-LTS] scsi: mpt3sas: Fix use-after-free warning
by Li Lingfeng 07 May '24

07 May '24

From: Sreekanth Reddy <sreekanth.reddy(a)broadcom.com> mainline inclusion from mainline-v6.0-rc5 commit 991df3dd5144f2e6b1c38b8d20ed3d4d21e20b34 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9LKDT CVE: CVE-2022-48695 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Fix the following use-after-free warning which is observed during controller reset: refcount_t: underflow; use-after-free. WARNING: CPU: 23 PID: 5399 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0 Link: https://lore.kernel.org/r/20220906134908.1039-2-sreekanth.reddy@broadcom.com Signed-off-by: Sreekanth Reddy <sreekanth.reddy(a)broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com> Conflicts: drivers/scsi/mpt3sas/mpt3sas_scsih.c [Commit 9e73ed2e4cf5 add fw_events_cleanup in _scsih_fw_event_cleanup_queue().] Signed-off-by: Li Lingfeng <lilingfeng3(a)huawei.com> --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index d3c944d99703..90e5fb48961a 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -3215,6 +3215,7 @@ static struct fw_event_work *dequeue_next_fw_event(struct MPT3SAS_ADAPTER *ioc) fw_event = list_first_entry(&ioc->fw_event_list, struct fw_event_work, list); list_del_init(&fw_event->list); + fw_event_work_put(fw_event); } spin_unlock_irqrestore(&ioc->fw_event_lock, flags); @@ -3249,7 +3250,6 @@ _scsih_fw_event_cleanup_queue(struct MPT3SAS_ADAPTER *ioc) if (cancel_work_sync(&fw_event->work)) fw_event_work_put(fw_event); - fw_event_work_put(fw_event); } } -- 2.31.1

2 1

[PATCH openEuler-1.0-LTS v2] media: v4l2-mem2mem: fix a memleak in v4l2_m2m_register_entity
by Yipeng Zou 07 May '24

07 May '24

From: Zhipeng Lu <alexious(a)zju.edu.cn> stable inclusion from stable-v5.10.214 commit afd2a82fe300032f63f8be5d6cd6981e75f8bbf2 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9L9JK CVE: CVE-2024-27077 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- The entity->name (i.e. name) is allocated in v4l2_m2m_register_entity but isn't freed in its following error-handling paths. This patch adds such deallocation to prevent memleak of entity->name. Fixes: be2fff656322 ("media: add helpers for memory-to-memory media controller") Signed-off-by: Zhipeng Lu <alexious(a)zju.edu.cn> Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: Yipeng Zou <zouyipeng(a)huawei.com> --- drivers/media/v4l2-core/v4l2-mem2mem.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/media/v4l2-core/v4l2-mem2mem.c b/drivers/media/v4l2-core/v4l2-mem2mem.c index 692b0597a35c..0a9be4c5f194 100644 --- a/drivers/media/v4l2-core/v4l2-mem2mem.c +++ b/drivers/media/v4l2-core/v4l2-mem2mem.c @@ -773,11 +773,17 @@ static int v4l2_m2m_register_entity(struct media_device *mdev, entity->function = function; ret = media_entity_pads_init(entity, num_pads, pads); - if (ret) + if (ret) { + kfree(entity->name); + entity->name = NULL; return ret; + } ret = media_device_register_entity(mdev, entity); - if (ret) + if (ret) { + kfree(entity->name); + entity->name = NULL; return ret; + } return 0; } -- 2.34.1

2 1

[PATCH OLK-6.6] irqchip/loongson-pch-pic: Update interrupt registration policy
by Hongchen Zhang 07 May '24

07 May '24

From: Baoqi Zhang <zhangbaoqi(a)loongson.cn> linux-next inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I9N1XJ CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/… -------------------------------- The current code is using a fixed mapping between the LS7A interrupt source and the HT interrupt vector. This prevents the utilization of the full interrupt vector space and therefore limits the number of interrupt source in a system. Replace the fixed mapping with a dynamic mapping which allocates a vector when an interrupt source is set up. This avoids that unused sources prevent vectors from being used for other devices. Introduce a mapping table in struct pch_pic, where each interrupt source will allocate an index as a 'hwirq' number from the table in the order of application and set table value as interrupt source number. This hwirq number will be configured as vector in the HT interrupt controller. For an interrupt source, the validity period of the obtained hwirq will last until the system reset. Co-developed-by: Biao Dong <dongbiao(a)loongson.cn> Signed-off-by: Biao Dong <dongbiao(a)loongson.cn> Co-developed-by: Tianyang Zhang <zhangtianyang(a)loongson.cn> Signed-off-by: Tianyang Zhang <zhangtianyang(a)loongson.cn> Signed-off-by: Baoqi Zhang <zhangbaoqi(a)loongson.cn> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Link: https://lore.kernel.org/r/20240422093830.27212-1-zhangtianyang@loongson.cn --- drivers/irqchip/irq-loongson-pch-pic.c | 70 ++++++++++++++++++++------ 1 file changed, 56 insertions(+), 14 deletions(-) diff --git a/drivers/irqchip/irq-loongson-pch-pic.c b/drivers/irqchip/irq-loongson-pch-pic.c index 3b150b6121fc..6b650020fff8 100644 --- a/drivers/irqchip/irq-loongson-pch-pic.c +++ b/drivers/irqchip/irq-loongson-pch-pic.c @@ -33,6 +33,7 @@ #define PIC_COUNT (PIC_COUNT_PER_REG * PIC_REG_COUNT) #define PIC_REG_IDX(irq_id) ((irq_id) / PIC_COUNT_PER_REG) #define PIC_REG_BIT(irq_id) ((irq_id) % PIC_COUNT_PER_REG) +#define PIC_UNDEF_VECTOR 255 #define PIC_COUNT_PER_REG64 64 #define PIC_REG64_COUNT 1 #define PIC_REG64_IDX(irq_id) ((irq_id) / PIC_COUNT_PER_REG64) @@ -50,6 +51,8 @@ struct pch_pic { u32 saved_vec_en[PIC_REG_COUNT]; u32 saved_vec_pol[PIC_REG_COUNT]; u32 saved_vec_edge[PIC_REG_COUNT]; + u8 table[PIC_COUNT]; + int inuse; }; static struct pch_pic *pch_pic_priv[MAX_IO_PICS]; @@ -61,6 +64,11 @@ struct irq_domain *get_pchpic_irq_domain(void) return pch_pic_priv[0]->pic_domain; } +static inline u8 hwirq_to_bit(struct pch_pic *priv, int hirq) +{ + return priv->table[hirq]; +} + static void pch_pic_bitset(struct pch_pic *priv, int offset, int bit) { u32 reg; @@ -89,45 +97,47 @@ static void pch_pic_mask_irq(struct irq_data *d) { struct pch_pic *priv = irq_data_get_irq_chip_data(d); - pch_pic_bitset(priv, PCH_PIC_MASK, d->hwirq); + pch_pic_bitset(priv, PCH_PIC_MASK, hwirq_to_bit(priv, d->hwirq)); irq_chip_mask_parent(d); } static void pch_pic_unmask_irq(struct irq_data *d) { struct pch_pic *priv = irq_data_get_irq_chip_data(d); + int bit = hwirq_to_bit(priv, d->hwirq); writeq(BIT(PIC_REG64_BIT(d->hwirq)), - priv->base + PCH_PIC_CLR + PIC_REG64_IDX(d->hwirq) * 8); + priv->base + PCH_PIC_CLR + PIC_REG64_IDX(bit) * 8); irq_chip_unmask_parent(d); - pch_pic_bitclr(priv, PCH_PIC_MASK, d->hwirq); + pch_pic_bitclr(priv, PCH_PIC_MASK, bit); } static int pch_pic_set_type(struct irq_data *d, unsigned int type) { struct pch_pic *priv = irq_data_get_irq_chip_data(d); + int bit = hwirq_to_bit(priv, d->hwirq); int ret = 0; switch (type) { case IRQ_TYPE_EDGE_RISING: - pch_pic_bitset(priv, PCH_PIC_EDGE, d->hwirq); - pch_pic_bitclr(priv, PCH_PIC_POL, d->hwirq); + pch_pic_bitset(priv, PCH_PIC_EDGE, bit); + pch_pic_bitclr(priv, PCH_PIC_POL, bit); irq_set_handler_locked(d, handle_edge_irq); break; case IRQ_TYPE_EDGE_FALLING: - pch_pic_bitset(priv, PCH_PIC_EDGE, d->hwirq); - pch_pic_bitset(priv, PCH_PIC_POL, d->hwirq); + pch_pic_bitset(priv, PCH_PIC_EDGE, bit); + pch_pic_bitset(priv, PCH_PIC_POL, bit); irq_set_handler_locked(d, handle_edge_irq); break; case IRQ_TYPE_LEVEL_HIGH: - pch_pic_bitclr(priv, PCH_PIC_EDGE, d->hwirq); - pch_pic_bitclr(priv, PCH_PIC_POL, d->hwirq); + pch_pic_bitclr(priv, PCH_PIC_EDGE, bit); + pch_pic_bitclr(priv, PCH_PIC_POL, bit); irq_set_handler_locked(d, handle_level_irq); break; case IRQ_TYPE_LEVEL_LOW: - pch_pic_bitclr(priv, PCH_PIC_EDGE, d->hwirq); - pch_pic_bitset(priv, PCH_PIC_POL, d->hwirq); + pch_pic_bitclr(priv, PCH_PIC_EDGE, bit); + pch_pic_bitset(priv, PCH_PIC_POL, bit); irq_set_handler_locked(d, handle_level_irq); break; default: @@ -142,11 +152,12 @@ static void pch_pic_ack_irq(struct irq_data *d) { unsigned int reg; struct pch_pic *priv = irq_data_get_irq_chip_data(d); + int bit = hwirq_to_bit(priv, d->hwirq); - reg = readl(priv->base + PCH_PIC_EDGE + PIC_REG_IDX(d->hwirq) * 4); + reg = readl(priv->base + PCH_PIC_EDGE + PIC_REG_IDX(bit) * 4); if (reg & BIT(PIC_REG_BIT(d->hwirq))) { writeq(BIT(PIC_REG64_BIT(d->hwirq)), - priv->base + PCH_PIC_CLR + PIC_REG64_IDX(d->hwirq) * 8); + priv->base + PCH_PIC_CLR + PIC_REG64_IDX(bit) * 8); } irq_chip_ack_parent(d); } @@ -168,6 +179,8 @@ static int pch_pic_domain_translate(struct irq_domain *d, { struct pch_pic *priv = d->host_data; struct device_node *of_node = to_of_node(fwspec->fwnode); + unsigned long flags; + int i; if (of_node) { if (fwspec->param_count < 2) @@ -180,12 +193,33 @@ static int pch_pic_domain_translate(struct irq_domain *d, return -EINVAL; *hwirq = fwspec->param[0] - priv->gsi_base; + if (fwspec->param_count > 1) *type = fwspec->param[1] & IRQ_TYPE_SENSE_MASK; else *type = IRQ_TYPE_NONE; } + raw_spin_lock_irqsave(&priv->pic_lock, flags); + /* Check pic-table to confirm if the hwirq has been assigned */ + for (i = 0; i < priv->inuse; i++) { + if (priv->table[i] == *hwirq) { + *hwirq = i; + break; + } + } + if (i == priv->inuse) { + /* Assign a new hwirq in pic-table */ + if (priv->inuse >= PIC_COUNT) { + pr_err("pch-pic domain has no free vectors\n"); + raw_spin_unlock_irqrestore(&priv->pic_lock, flags); + return -EINVAL; + } + priv->table[priv->inuse] = *hwirq; + *hwirq = priv->inuse++; + } + raw_spin_unlock_irqrestore(&priv->pic_lock, flags); + return 0; } @@ -203,6 +237,9 @@ static int pch_pic_alloc(struct irq_domain *domain, unsigned int virq, if (err) return err; + /* Write vector ID */ + writeb(priv->ht_vec_base + hwirq, priv->base + PCH_INT_HTVEC(hwirq_to_bit(priv, hwirq))); + parent_fwspec.fwnode = domain->parent->fwnode; parent_fwspec.param_count = 1; parent_fwspec.param[0] = hwirq + priv->ht_vec_base; @@ -231,7 +268,7 @@ static void pch_pic_reset(struct pch_pic *priv) for (i = 0; i < PIC_COUNT; i++) { /* Write vector ID */ - writeb(priv->ht_vec_base + i, priv->base + PCH_INT_HTVEC(i)); + writeb(priv->ht_vec_base + i, priv->base + PCH_INT_HTVEC(hwirq_to_bit(priv, i))); /* Hardcode route to HT0 Lo */ writeb(1, priv->base + PCH_INT_ROUTE(i)); } @@ -295,6 +332,7 @@ static int pch_pic_init(phys_addr_t addr, unsigned long size, int vec_base, u32 gsi_base) { struct pch_pic *priv; + int i; priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (!priv) @@ -305,6 +343,10 @@ static int pch_pic_init(phys_addr_t addr, unsigned long size, int vec_base, if (!priv->base) goto free_priv; + priv->inuse = 0; + for (i = 0; i < PIC_COUNT; i++) + priv->table[i] = PIC_UNDEF_VECTOR; + priv->ht_vec_base = vec_base; priv->vec_count = ((readq(priv->base) >> 48) & 0xff) + 1; priv->gsi_base = gsi_base; -- 2.33.0

2 1

[PATCH openEuler-1.0-LTS v2] nouveau: fix instmem race condition around ptr stores
by Yipeng Zou 07 May '24

07 May '24

From: Dave Airlie <airlied(a)redhat.com> stable inclusion from stable-v5.10.214 commit 13d76b2f443dc371842916dd8768009ff1594716 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9LK8M CVE: CVE-2024-26984 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- commit fff1386cc889d8fb4089d285f883f8cba62d82ce upstream. Running a lot of VK CTS in parallel against nouveau, once every few hours you might see something like this crash. BUG: kernel NULL pointer dereference, address: 0000000000000008 PGD 8000000114e6e067 P4D 8000000114e6e067 PUD 109046067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 7 PID: 53891 Comm: deqp-vk Not tainted 6.8.0-rc6+ #27 Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021 RIP: 0010:gp100_vmm_pgt_mem+0xe3/0x180 [nouveau] Code: c7 48 01 c8 49 89 45 58 85 d2 0f 84 95 00 00 00 41 0f b7 46 12 49 8b 7e 08 89 da 42 8d 2c f8 48 8b 47 08 41 83 c7 01 48 89 ee <48> 8b 40 08 ff d0 0f 1f 00 49 8b 7e 08 48 89 d9 48 8d 75 04 48 c1 RSP: 0000:ffffac20c5857838 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00000000004d8001 RCX: 0000000000000001 RDX: 00000000004d8001 RSI: 00000000000006d8 RDI: ffffa07afe332180 RBP: 00000000000006d8 R08: ffffac20c5857ad0 R09: 0000000000ffff10 R10: 0000000000000001 R11: ffffa07af27e2de0 R12: 000000000000001c R13: ffffac20c5857ad0 R14: ffffa07a96fe9040 R15: 000000000000001c FS: 00007fe395eed7c0(0000) GS:ffffa07e2c980000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000011febe001 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ... ? gp100_vmm_pgt_mem+0xe3/0x180 [nouveau] ? gp100_vmm_pgt_mem+0x37/0x180 [nouveau] nvkm_vmm_iter+0x351/0xa20 [nouveau] ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau] ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau] ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau] ? __lock_acquire+0x3ed/0x2170 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau] nvkm_vmm_ptes_get_map+0xc2/0x100 [nouveau] ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau] ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau] nvkm_vmm_map_locked+0x224/0x3a0 [nouveau] Adding any sort of useful debug usually makes it go away, so I hand wrote the function in a line, and debugged the asm. Every so often pt->memory->ptrs is NULL. This ptrs ptr is set in the nv50_instobj_acquire called from nvkm_kmap. If Thread A and Thread B both get to nv50_instobj_acquire around the same time, and Thread A hits the refcount_set line, and in lockstep thread B succeeds at refcount_inc_not_zero, there is a chance the ptrs value won't have been stored since refcount_set is unordered. Force a memory barrier here, I picked smp_mb, since we want it on all CPUs and it's write followed by a read. v2: use paired smp_rmb/smp_wmb. Cc: <stable(a)vger.kernel.org> Fixes: be55287aa5ba ("drm/nouveau/imem/nv50: embed nvkm_instobj directly into nv04_instobj") Signed-off-by: Dave Airlie <airlied(a)redhat.com> Signed-off-by: Danilo Krummrich <dakr(a)redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240411011510.2546857-1-airl… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Yipeng Zou <zouyipeng(a)huawei.com> --- drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c index db48a1daca0c..f8ca79eaa7f7 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/nv50.c @@ -221,8 +221,11 @@ nv50_instobj_acquire(struct nvkm_memory *memory) void __iomem *map = NULL; /* Already mapped? */ - if (refcount_inc_not_zero(&iobj->maps)) + if (refcount_inc_not_zero(&iobj->maps)) { + /* read barrier match the wmb on refcount set */ + smp_rmb(); return iobj->map; + } /* Take the lock, and re-check that another thread hasn't * already mapped the object in the meantime. @@ -249,6 +252,8 @@ nv50_instobj_acquire(struct nvkm_memory *memory) iobj->base.memory.ptrs = &nv50_instobj_fast; else iobj->base.memory.ptrs = &nv50_instobj_slow; + /* barrier to ensure the ptrs are written before refcount is set */ + smp_wmb(); refcount_set(&iobj->maps, 1); } -- 2.34.1

2 1

[PATCH OLK-5.10] net/smc: Fix possible access to freed memory in link clear
by Zhengchao Shao 07 May '24

07 May '24

From: Yacan Liu <liuyacan(a)corp.netease.com> mainline inclusion from mainline-v6.0-rc5 commit e9b1a4f867ae9c1dbd1d71cd09cbdb3239fb4968 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9LK4T CVE: CVE-2022-48673 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- After modifying the QP to the Error state, all RX WR would be completed with WC in IB_WC_WR_FLUSH_ERR status. Current implementation does not wait for it is done, but destroy the QP and free the link group directly. So there is a risk that accessing the freed memory in tasklet context. Here is a crash example: BUG: unable to handle page fault for address: ffffffff8f220860 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD f7300e067 P4D f7300e067 PUD f7300f063 PMD 8c4e45063 PTE 800ffff08c9df060 Oops: 0002 [#1] SMP PTI CPU: 1 PID: 0 Comm: swapper/1 Kdump: loaded Tainted: G S OE 5.10.0-0607+ #23 Hardware name: Inspur NF5280M4/YZMB-00689-101, BIOS 4.1.20 07/09/2018 RIP: 0010:native_queued_spin_lock_slowpath+0x176/0x1b0 Code: f3 90 48 8b 32 48 85 f6 74 f6 eb d5 c1 ee 12 83 e0 03 83 ee 01 48 c1 e0 05 48 63 f6 48 05 00 c8 02 00 48 03 04 f5 00 09 98 8e <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 32 RSP: 0018:ffffb3b6c001ebd8 EFLAGS: 00010086 RAX: ffffffff8f220860 RBX: 0000000000000246 RCX: 0000000000080000 RDX: ffff91db1f86c800 RSI: 000000000000173c RDI: ffff91db62bace00 RBP: ffff91db62bacc00 R08: 0000000000000000 R09: c00000010000028b R10: 0000000000055198 R11: ffffb3b6c001ea58 R12: ffff91db80e05010 R13: 000000000000000a R14: 0000000000000006 R15: 0000000000000040 FS: 0000000000000000(0000) GS:ffff91db1f840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffff8f220860 CR3: 00000001f9580004 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> _raw_spin_lock_irqsave+0x30/0x40 mlx5_ib_poll_cq+0x4c/0xc50 [mlx5_ib] smc_wr_rx_tasklet_fn+0x56/0xa0 [smc] tasklet_action_common.isra.21+0x66/0x100 __do_softirq+0xd5/0x29c asm_call_irq_on_stack+0x12/0x20 </IRQ> do_softirq_own_stack+0x37/0x40 irq_exit_rcu+0x9d/0xa0 sysvec_call_function_single+0x34/0x80 asm_sysvec_call_function_single+0x12/0x20 Fixes: bd4ad57718cc ("smc: initialize IB transport incl. PD, MR, QP, CQ, event, WR") Signed-off-by: Yacan Liu <liuyacan(a)corp.netease.com> Reviewed-by: Tony Lu <tonylu(a)linux.alibaba.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Conflicts: net/smc/smc_core.c net/smc/smc_core.h Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- net/smc/smc_core.c | 1 + net/smc/smc_core.h | 2 ++ net/smc/smc_wr.c | 5 +++++ net/smc/smc_wr.h | 5 +++++ 4 files changed, 13 insertions(+) diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 2283cbf6b8ab..5312ca02a3e4 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -703,6 +703,7 @@ int smcr_link_init(struct smc_link_group *lgr, struct smc_link *lnk, lnk->lgr = lgr; smc_lgr_hold(lgr); /* lgr_put in smcr_link_clear() */ lnk->link_idx = link_idx; + lnk->wr_rx_id_compl = 0; lnk->smcibdev = ini->ib_dev; lnk->ibport = ini->ib_port; smc_ibdev_cnt_inc(lnk); diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index 739ead4360b0..16c286360d36 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -105,8 +105,10 @@ struct smc_link { /* above three vectors have wr_rx_cnt elements and use the same index */ dma_addr_t wr_rx_dma_addr; /* DMA address of wr_rx_bufs */ u64 wr_rx_id; /* seq # of last recv WR */ + u64 wr_rx_id_compl; /* seq # of last completed WR */ u32 wr_rx_cnt; /* number of WR recv buffers */ unsigned long wr_rx_tstamp; /* jiffies when last buf rx */ + wait_queue_head_t wr_rx_empty_wait; /* wait for RQ empty */ struct ib_reg_wr wr_reg; /* WR register memory region */ wait_queue_head_t wr_reg_wait; /* wait for wr_reg result */ diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c index d6a4d8b9d446..64c455abf861 100644 --- a/net/smc/smc_wr.c +++ b/net/smc/smc_wr.c @@ -388,6 +388,7 @@ static inline void smc_wr_rx_process_cqes(struct ib_wc wc[], int num) for (i = 0; i < num; i++) { link = wc[i].qp->qp_context; + link->wr_rx_id_compl = wc[i].wr_id; if (wc[i].status == IB_WC_SUCCESS) { link->wr_rx_tstamp = jiffies; smc_wr_rx_demultiplex(&wc[i]); @@ -399,6 +400,8 @@ static inline void smc_wr_rx_process_cqes(struct ib_wc wc[], int num) case IB_WC_RNR_RETRY_EXC_ERR: case IB_WC_WR_FLUSH_ERR: smcr_link_down_cond_sched(link); + if (link->wr_rx_id_compl == link->wr_rx_id) + wake_up(&link->wr_rx_empty_wait); break; default: smc_wr_rx_post(link); /* refill WR RX */ @@ -542,6 +545,7 @@ void smc_wr_free_link(struct smc_link *lnk) return; ibdev = lnk->smcibdev->ibdev; + smc_wr_drain_cq(lnk); smc_wr_wakeup_reg_wait(lnk); smc_wr_wakeup_tx_wait(lnk); @@ -711,6 +715,7 @@ int smc_wr_create_link(struct smc_link *lnk) atomic_set(&lnk->wr_tx_refcnt, 0); init_waitqueue_head(&lnk->wr_reg_wait); atomic_set(&lnk->wr_reg_refcnt, 0); + init_waitqueue_head(&lnk->wr_rx_empty_wait); return rc; dma_unmap: diff --git a/net/smc/smc_wr.h b/net/smc/smc_wr.h index cb58e60078f5..4fbd0ef26f00 100644 --- a/net/smc/smc_wr.h +++ b/net/smc/smc_wr.h @@ -73,6 +73,11 @@ static inline void smc_wr_tx_link_put(struct smc_link *link) wake_up_all(&link->wr_tx_wait); } +static inline void smc_wr_drain_cq(struct smc_link *lnk) +{ + wait_event(lnk->wr_rx_empty_wait, lnk->wr_rx_id_compl == lnk->wr_rx_id); +} + static inline void smc_wr_wakeup_tx_wait(struct smc_link *lnk) { wake_up_all(&lnk->wr_tx_wait); -- 2.34.1

2 1