mailweb.openeuler.org
Manage this list

Keyboard Shortcuts

Thread View

  • j: Next unread message
  • k: Previous unread message
  • j a: Jump to all threads
  • j l: Jump to MailingList overview

Kernel

Threads by month
  • ----- 2025 -----
  • May
  • April
  • March
  • February
  • January
  • ----- 2024 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2023 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2022 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2021 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2020 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2019 -----
  • December
kernel@openeuler.org

  • 26 participants
  • 18030 discussions
[PATCH kernel-4.19] block: dio: ensure the memory order between bi_private and bi_css
by Yang Yingliang 26 Jun '21

26 Jun '21
From: Hou Tao <houtao1(a)huawei.com> hulk inclusion category: bugfix bugzilla: 167067 CVE: NA -------------------------------- In __blkdev_direct_IO_simple(), when bi_private is NULL, it assumes bi_css must be NULL as showed below: CPU 1: CPU 2: __blkdev_direct_IO_simple submit_bio bio_endio bio_uninit(bio) css_put(bi_css) bi_css = NULL set_current_state(TASK_UNINTERRUPTIBLE) bio->bi_end_io blkdev_bio_end_io_simple bio->bi_private = NULL // bi_private is NULL READ_ONCE(bio->bi_private) wake_up_process smp_mb__after_spinlock bio_unint(bio) // read bi_css as no-NULL css_put(bi_css) Because there is no memory barrier between the reading and the writing of these two variables, the assumption is wrong under weak-memory model machine (e.g. arm64). bi_css will be put twice and leads to the following warning: percpu_ref_switch_to_atomic_rcu: percpu ref (css_release) <= 0 (-3) after switching to atomic There is a similar problem in __blkdev_direct_IO() which occurs between dio->waiter and bio.bi_status. Fixing it by adding a smp_rmb() between the reads of two variables, and a corresponding smp_wmb() between the writes. Signed-off-by: Hou Tao <houtao1(a)huawei.com> Signed-off-by: Yu Kuai <yukuai3(a)huawei.com> Reviewed-by: Jason Yan <yanaijie(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- fs/block_dev.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/fs/block_dev.c b/fs/block_dev.c index 786d105692e85..30dd7b19bd2e3 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -195,6 +195,11 @@ static void blkdev_bio_end_io_simple(struct bio *bio) { struct task_struct *waiter = bio->bi_private; + /* + * Paired with smp_rmb() after reading bio->bi_private + * in __blkdev_direct_IO_simple() + */ + smp_wmb(); WRITE_ONCE(bio->bi_private, NULL); wake_up_process(waiter); } @@ -251,8 +256,14 @@ __blkdev_direct_IO_simple(struct kiocb *iocb, struct iov_iter *iter, qc = submit_bio(&bio); for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); - if (!READ_ONCE(bio.bi_private)) + if (!READ_ONCE(bio.bi_private)) { + /* + * Paired with smp_wmb() in + * blkdev_bio_end_io_simple() + */ + smp_rmb(); break; + } if (!(iocb->ki_flags & IOCB_HIPRI) || !blk_poll(bdev_get_queue(bdev), qc)) io_schedule(); @@ -317,6 +328,12 @@ static void blkdev_bio_end_io(struct bio *bio) } else { struct task_struct *waiter = dio->waiter; + if (!dio->multi_bio) + /* + * Paired with smp_rmb() after reading + * dio->waiter in __blkdev_direct_IO() + */ + smp_wmb(); WRITE_ONCE(dio->waiter, NULL); wake_up_process(waiter); } @@ -417,8 +434,11 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_pages) for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); - if (!READ_ONCE(dio->waiter)) + if (!READ_ONCE(dio->waiter)) { + /* Paired with smp_wmb() in blkdev_bio_end_io() */ + smp_rmb(); break; + } if (!(iocb->ki_flags & IOCB_HIPRI) || !blk_poll(bdev_get_queue(bdev), qc)) -- 2.25.1
1 0
0 0
[PATCH openEuler-1.0-LTS] btrfs: allow btrfs_truncate_block() to fallback to nocow for data space reservation
by Cheng Jian 25 Jun '21

25 Jun '21
From: Qu Wenruo <wqu(a)suse.com> mainline inclusion from mainline-5.13.0-rc5 commit 6d4572a9d71d5fc2affee0258d8582d39859188c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I39MZM CVE: NA ------------------------------------------------------ From: Gou Hao <gouhao(a)uniontech.com> btrfs: make btrfs_truncate_block check NOCOW attribute For the modification of the main line, there are two places not merge. The mainline code adds the parameter nowait in btrfs_check_can_nocow, during the round, it only changes the name of check_can_nocow. The code of the main line calls the btrfs_drew_write_unlock function at the end of btrfs_truncate_block, and 4.19 kernel of the code does not have this function. In the Reference address, the two codes were not merge too. ------------------------------------------------------ btrfs: allow btrfs_truncate_block() to fallback to nocow for data space reservation [BUG] When the data space is exhausted, even if the inode has NOCOW attribute, we will still refuse to truncate unaligned range due to ENOSPC. The following script can reproduce it pretty easily: #!/bin/bash dev=/dev/test/test mnt=/mnt/btrfs umount $dev &> /dev/null umount $mnt &> /dev/null mkfs.btrfs -f $dev -b 1G mount -o nospace_cache $dev $mnt touch $mnt/foobar chattr +C $mnt/foobar xfs_io -f -c "pwrite -b 4k 0 4k" $mnt/foobar > /dev/null xfs_io -f -c "pwrite -b 4k 0 1G" $mnt/padding &> /dev/null sync xfs_io -c "fpunch 0 2k" $mnt/foobar umount $mnt Currently this will fail at the fpunch part. [CAUSE] Because btrfs_truncate_block() always reserves space without checking the NOCOW attribute. Since the writeback path follows NOCOW bit, we only need to bother the space reservation code in btrfs_truncate_block(). [FIX] Make btrfs_truncate_block() follow btrfs_buffered_write() to try to reserve data space first, and fall back to NOCOW check only when we don't have enough space. Such always-try-reserve is an optimization introduced in btrfs_buffered_write(), to avoid expensive btrfs_check_can_nocow() call. This patch will export check_can_nocow() as btrfs_check_can_nocow(), and use it in btrfs_truncate_block() to fix the problem. Reference: https://patchwork.kernel.org/project/linux-btrfs/patch/20200130052822.11765… Reported-by: Martin Doucha <martin.doucha(a)suse.com> Reviewed-by: Filipe Manana <fdmanana(a)suse.com> Reviewed-by: Anand Jain <anand.jain(a)oracle.com> Signed-off-by: Qu Wenruo <wqu(a)suse.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> Signed-off-by: Gou Hao <gouhao(a)uniontech.com> Signed-off-by: Cheng Jian <cj.chengjian(a)huawei.com> --- fs/btrfs/ctree.h | 3 ++- fs/btrfs/file.c | 8 ++++---- fs/btrfs/inode.c | 39 +++++++++++++++++++++++++++++++++------ 3 files changed, 39 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 49ef2e48a8c6..3d8c699e44ea 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3273,7 +3273,8 @@ int btrfs_dirty_pages(struct inode *inode, struct page **pages, int btrfs_fdatawrite_range(struct inode *inode, loff_t start, loff_t end); int btrfs_clone_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, u64 len); - +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos, + size_t *write_bytes); /* tree-defrag.c */ int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, struct btrfs_root *root); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index d9d90f0b66d2..f0600b1c6d90 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1536,8 +1536,8 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages, return ret; } -static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos, - size_t *write_bytes) +int btrfs_check_can_nocow(struct btrfs_inode *inode, loff_t pos, + size_t *write_bytes) { struct btrfs_fs_info *fs_info = inode->root->fs_info; struct btrfs_root *root = inode->root; @@ -1647,7 +1647,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, if (ret < 0) { if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC)) && - check_can_nocow(BTRFS_I(inode), pos, + btrfs_check_can_nocow(BTRFS_I(inode), pos, &write_bytes) > 0) { /* * For nodata cow case, no need to reserve @@ -1923,7 +1923,7 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, */ if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC)) || - check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) { + btrfs_check_can_nocow(BTRFS_I(inode), pos, &count) <= 0) { inode_unlock(inode); return -EAGAIN; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 69e376d27bcc..52da573741ed 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4901,11 +4901,13 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, struct extent_state *cached_state = NULL; struct extent_changeset *data_reserved = NULL; char *kaddr; + bool only_release_metadata = false; u32 blocksize = fs_info->sectorsize; pgoff_t index = from >> PAGE_SHIFT; unsigned offset = from & (blocksize - 1); struct page *page; gfp_t mask = btrfs_alloc_write_mask(mapping); + size_t write_bytes = blocksize; int ret = 0; u64 block_start; u64 block_end; @@ -4917,10 +4919,26 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, block_start = round_down(from, blocksize); block_end = block_start + blocksize - 1; - ret = btrfs_delalloc_reserve_space(inode, &data_reserved, - block_start, blocksize); - if (ret) + ret = btrfs_check_data_free_space(inode, &data_reserved, + block_start, blocksize); + if (ret < 0) { + if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | + BTRFS_INODE_PREALLOC)) && + btrfs_check_can_nocow(BTRFS_I(inode), block_start, + &write_bytes) > 0) { + /* For nocow case, no need to reserve data space */ + only_release_metadata = true; + } else { + goto out; + } + } + ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), blocksize); + if (ret < 0) { + if (!only_release_metadata) + btrfs_free_reserved_data_space(inode, data_reserved, + block_start, blocksize); goto out; + } again: page = find_or_create_page(mapping, index, mask); @@ -4991,10 +5009,19 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, set_page_dirty(page); unlock_extent_cached(io_tree, block_start, block_end, &cached_state); + if (only_release_metadata) + set_extent_bit(&BTRFS_I(inode)->io_tree, block_start, + block_end, EXTENT_NORESERVE, NULL, NULL, + GFP_NOFS); out_unlock: - if (ret) - btrfs_delalloc_release_space(inode, data_reserved, block_start, - blocksize, true); + if (ret) { + if (only_release_metadata) + btrfs_delalloc_release_metadata(BTRFS_I(inode), + blocksize, true); + else + btrfs_delalloc_release_space(inode, data_reserved, + block_start, blocksize, true); + } btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize); unlock_page(page); put_page(page); -- 2.25.1
1 0
0 0
【Meeting Notice】openEuler kernel 技术分享第七期 & 双周例会 Time: 2021-06-25 14:00-18:00
by Meeting Book 25 Jun '21

25 Jun '21
1 0
0 0
[PATCH kernel-4.19] ext4: fix memory leak in ext4_fill_super
by Yang Yingliang 24 Jun '21

24 Jun '21
From: Alexey Makhalov <amakhalov(a)vmware.com> mainline inclusion from mainline-v5.13-rc5 commit afd09b617db3786b6ef3dc43e28fe728cfea84df category: bugfix bugzilla: 148443 CVE: NA ----------------------------------------------- Buffer head references must be released before calling kill_bdev(); otherwise the buffer head (and its page referenced by b_data) will not be freed by kill_bdev, and subsequently that bh will be leaked. If blocksizes differ, sb_set_blocksize() will kill current buffers and page cache by using kill_bdev(). And then super block will be reread again but using correct blocksize this time. sb_set_blocksize() didn't fully free superblock page and buffer head, and being busy, they were not freed and instead leaked. This can easily be reproduced by calling an infinite loop of: systemctl start <ext4_on_lvm>.mount, and systemctl stop <ext4_on_lvm>.mount ... since systemd creates a cgroup for each slice which it mounts, and the bh leak get amplified by a dying memory cgroup that also never gets freed, and memory consumption is much more easily noticed. Fixes: ce40733ce93d ("ext4: Check for return value from sb_set_blocksize") Fixes: ac27a0ec112a ("ext4: initial copy of files from ext3") Link: https://lore.kernel.org/r/20210521075533.95732-1-amakhalov@vmware.com Signed-off-by: Alexey Makhalov <amakhalov(a)vmware.com> Signed-off-by: Theodore Ts'o <tytso(a)mit.edu> Cc: stable(a)kernel.org conflicts: fs/ext4/super.c Signed-off-by: Ye Bin <yebin10(a)huawei.com> Reviewed-by: Zhang Yi <yi.zhang(a)huawei.com> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- fs/ext4/super.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a964ed63601c6..f66bbe73d1a94 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4179,14 +4179,20 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) } if (sb->s_blocksize != blocksize) { + /* + * bh must be released before kill_bdev(), otherwise + * it won't be freed and its page also. kill_bdev() + * is called by sb_set_blocksize(). + */ + brelse(bh); /* Validate the filesystem blocksize */ if (!sb_set_blocksize(sb, blocksize)) { ext4_msg(sb, KERN_ERR, "bad block size %d", blocksize); + bh = NULL; goto failed_mount; } - brelse(bh); logical_sb_block = sb_block * EXT4_MIN_BLOCK_SIZE; offset = do_div(logical_sb_block, blocksize); bh = ext4_sb_bread_unmovable(sb, logical_sb_block); @@ -4861,8 +4867,9 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) for (i = 0; i < EXT4_MAXQUOTAS; i++) kfree(sbi->s_qf_names[i]); #endif - ext4_blkdev_remove(sbi); + /* ext4_blkdev_remove() calls kill_bdev(), release bh before it. */ brelse(bh); + ext4_blkdev_remove(sbi); out_fail: sb->s_fs_info = NULL; kfree(sbi->s_blockgroup_lock); -- 2.25.1
1 0
0 0
【Meeting Notice】openEuler kernel 技术分享第七期 & 双周例会 Time: 2021-06-25 14:00-18:00
by Meeting Book 24 Jun '21

24 Jun '21
2 1
0 0
快来看你是否上榜~!openEuler 任务打榜赛5月榜单出炉
by Marketing openEuler 24 Jun '21

24 Jun '21
经过一个月的角逐,openEuler 任务打榜赛5月榜单来啦。 在这一个月的时间里,开发者参加任务打榜赛的热情高涨:总计有*38*人领取*118*道赛题,赛道交流群成员*70*个,这里 *提醒领取赛题但是还没有加群的小伙伴赶紧联系小助手加群*呀。 好啦~,下面让我们看一下5月份有哪些开发者上榜吧! gitee用户账号 gitee邮箱 积分值 解决issue/pr数量 li-kaiyuan66666 likaiyuan7(a)huawei.com 6 1 yanxiaobing2020 yanxiaobing(a)huawei.com 2 1 zhu-yuncheng zhuyuncheng(a)huawei.com 6 1 polite2anyone zhangyao05(a)outlook.com 10 2 yaozc701 yaozc7(a)chinaunicom.cn 18 2 恭喜以上开发者,请中奖的开发者联系小助手*领奖*。没有上榜的小伙伴不要灰心请再接再历,周榜、月榜前十名均有奖励,期待在榜单上早点见到你。 让我们共同期待6月的榜单。 [image: image.png] 添加小助手微信 回复“*任务打榜赛*”进入交流群 *了解更多* - 官网:https://openeuler.org/zh/interaction/hdc/2021-developer.html - 我要报名: https://competition.huaweicloud.com/information/1000041420/introduction - 我要做题: https://gitee.com/organizations/src-openeuler/issues?assignee_id=&author_id…
1 1
0 0
[PATCH openEuler-21.03] net: hns3: fix incorrect resp_msg issue
by moyu 23 Jun '21

23 Jun '21
From: Jiaran Zhang <zhangjiaran(a)huawei.com> table inclusion from stable-5.12.9 commit d207c1e8e3fb82aca5ffd14c33d0848b6b820efc bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=205 CVE: NA -------------------------------- [ Upstream commit a710b9ffbebaf713f7dbd4dbd9524907e5d66f33 ] In hclge_mbx_handler(), if there are two consecutive mailbox messages that requires resp_msg, the resp_msg is not cleared after processing the first message, which will cause the resp_msg data of second message incorrect. Fix it by clearing the resp_msg before processing every mailbox message. Fixes: bb5790b71bad ("net: hns3: refactor mailbox response scheme between PF and VF") Signed-off-by: Jiaran Zhang <zhangjiaran(a)huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong(a)huawei.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: moyu <xiahua.991116(a)qq.com> --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c index 3ab6db2588d3..324d150d2b22 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c @@ -657,7 +657,6 @@ void hclge_mbx_handler(struct hclge_dev *hdev) unsigned int flag; int ret = 0; - memset(&resp_msg, 0, sizeof(resp_msg)); /* handle all the mailbox requests in the queue */ while (!hclge_cmd_crq_empty(&hdev->hw)) { if (test_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state)) { @@ -685,6 +684,9 @@ void hclge_mbx_handler(struct hclge_dev *hdev) trace_hclge_pf_mbx_get(hdev, req); + /* clear the resp_msg before processing every mailbox message */ + memset(&resp_msg, 0, sizeof(resp_msg)); + switch (req->msg.code) { case HCLGE_MBX_MAP_RING_TO_VECTOR: ret = hclge_map_unmap_ring_to_vf_vector(vport, true, -- 2.23.0
1 0
0 0
[PATCH openEuler-21.03] net: hns3: fix incorrect resp_msg issue table inclusion from stable-5.12.9 commit d207c1e8e3fb82aca5ffd14c33d0848b6b820efc bugzilla: https://bugzilla.openeuler.org/show_bug.cgi?id=205 CVE: NA
by moyu 23 Jun '21

23 Jun '21
From: Jiaran Zhang <zhangjiaran(a)huawei.com> -------------------------------- [ Upstream commit a710b9ffbebaf713f7dbd4dbd9524907e5d66f33 ] In hclge_mbx_handler(), if there are two consecutive mailbox messages that requires resp_msg, the resp_msg is not cleared after processing the first message, which will cause the resp_msg data of second message incorrect. Fix it by clearing the resp_msg before processing every mailbox message. Fixes: bb5790b71bad ("net: hns3: refactor mailbox response scheme between PF and VF") Signed-off-by: Jiaran Zhang <zhangjiaran(a)huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong(a)huawei.com> Signed-off-by: David S. Miller <davem(a)davemloft.net> Signed-off-by: Sasha Levin <sashal(a)kernel.org> Signed-off-by: moyu <xiahua.991116(a)qq.com> --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c index 3ab6db2588d3..324d150d2b22 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mbx.c @@ -657,7 +657,6 @@ void hclge_mbx_handler(struct hclge_dev *hdev) unsigned int flag; int ret = 0; - memset(&resp_msg, 0, sizeof(resp_msg)); /* handle all the mailbox requests in the queue */ while (!hclge_cmd_crq_empty(&hdev->hw)) { if (test_bit(HCLGE_STATE_CMD_DISABLE, &hdev->state)) { @@ -685,6 +684,9 @@ void hclge_mbx_handler(struct hclge_dev *hdev) trace_hclge_pf_mbx_get(hdev, req); + /* clear the resp_msg before processing every mailbox message */ + memset(&resp_msg, 0, sizeof(resp_msg)); + switch (req->msg.code) { case HCLGE_MBX_MAP_RING_TO_VECTOR: ret = hclge_map_unmap_ring_to_vf_vector(vport, true, -- 2.23.0
1 0
0 0
[PATCH openEuler-1.0-LTS 1/4] RDMA/ucma: Put a lock around every call to the rdma_cm layer
by Yang Yingliang 23 Jun '21

23 Jun '21
From: Jason Gunthorpe <jgg(a)mellanox.com> stable inclusion from linux-4.19.115 commit abc4ea7f1345398261295345fd9b30243e4f4f8e prepare for fixing CVE-2020-36385 -------------------------------- commit 7c11910783a1ea17e88777552ef146cace607b3c upstream. The rdma_cm must be used single threaded. This appears to be a bug in the design, as it does have lots of locking that seems like it should allow concurrency. However, when it is all said and done every single place that uses the cma_exch() scheme is broken, and all the unlocked reads from the ucma of the cm_id data are wrong too. syzkaller has been finding endless bugs related to this. Fixing this in any elegant way is some enormous amount of work. Take a very big hammer and put a mutex around everything to do with the ucma_context at the top of every syscall. Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace") Link: https://lore.kernel.org/r/20200218210432.GA31966@ziepe.ca Reported-by: syzbot+adb15cf8c2798e4e0db4(a)syzkaller.appspotmail.com Reported-by: syzbot+e5579222b6a3edd96522(a)syzkaller.appspotmail.com Reported-by: syzbot+4b628fcc748474003457(a)syzkaller.appspotmail.com Reported-by: syzbot+29ee8f76017ce6cf03da(a)syzkaller.appspotmail.com Reported-by: syzbot+6956235342b7317ec564(a)syzkaller.appspotmail.com Reported-by: syzbot+b358909d8d01556b790b(a)syzkaller.appspotmail.com Reported-by: syzbot+6b46b135602a3f3ac99e(a)syzkaller.appspotmail.com Reported-by: syzbot+8458d13b13562abf6b77(a)syzkaller.appspotmail.com Reported-by: syzbot+bd034f3fdc0402e942ed(a)syzkaller.appspotmail.com Reported-by: syzbot+c92378b32760a4eef756(a)syzkaller.appspotmail.com Reported-by: syzbot+68b44a1597636e0b342c(a)syzkaller.appspotmail.com Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com> --- drivers/infiniband/core/ucma.c | 49 ++++++++++++++++++++++++++++++++-- 1 file changed, 47 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c index 01d68ed46c1b6..2acc30c3d5b2d 100644 --- a/drivers/infiniband/core/ucma.c +++ b/drivers/infiniband/core/ucma.c @@ -89,6 +89,7 @@ struct ucma_context { struct ucma_file *file; struct rdma_cm_id *cm_id; + struct mutex mutex; u64 uid; struct list_head list; @@ -215,6 +216,7 @@ static struct ucma_context *ucma_alloc_ctx(struct ucma_file *file) init_completion(&ctx->comp); INIT_LIST_HEAD(&ctx->mc_list); ctx->file = file; + mutex_init(&ctx->mutex); mutex_lock(&mut); ctx->id = idr_alloc(&ctx_idr, ctx, 0, 0, GFP_KERNEL); @@ -596,6 +598,7 @@ static int ucma_free_ctx(struct ucma_context *ctx) } events_reported = ctx->events_reported; + mutex_destroy(&ctx->mutex); kfree(ctx); return events_reported; } @@ -665,7 +668,10 @@ static ssize_t ucma_bind_ip(struct ucma_file *file, const char __user *inbuf, if (IS_ERR(ctx)) return PTR_ERR(ctx); + mutex_lock(&ctx->mutex); ret = rdma_bind_addr(ctx->cm_id, (struct sockaddr *) &cmd.addr); + mutex_unlock(&ctx->mutex); + ucma_put_ctx(ctx); return ret; } @@ -688,7 +694,9 @@ static ssize_t ucma_bind(struct ucma_file *file, const char __user *inbuf, if (IS_ERR(ctx)) return PTR_ERR(ctx); + mutex_lock(&ctx->mutex); ret = rdma_bind_addr(ctx->cm_id, (struct sockaddr *) &cmd.addr); + mutex_unlock(&ctx->mutex); ucma_put_ctx(ctx); return ret; } @@ -712,8 +720,10 @@ static ssize_t ucma_resolve_ip(struct ucma_file *file, if (IS_ERR(ctx)) return PTR_ERR(ctx); + mutex_lock(&ctx->mutex); ret = rdma_resolve_addr(ctx->cm_id, (struct sockaddr *) &cmd.src_addr, (struct sockaddr *) &cmd.dst_addr, cmd.timeout_ms); + mutex_unlock(&ctx->mutex); ucma_put_ctx(ctx); return ret; } @@ -738,8 +748,10 @@ static ssize_t ucma_resolve_addr(struct ucma_file *file, if (IS_ERR(ctx)) return PTR_ERR(ctx); + mutex_lock(&ctx->mutex); ret = rdma_resolve_addr(ctx->cm_id, (struct sockaddr *) &cmd.src_addr, (struct sockaddr *) &cmd.dst_addr, cmd.timeout_ms); + mutex_unlock(&ctx->mutex); ucma_put_ctx(ctx); return ret; } @@ -759,7 +771,9 @@ static ssize_t ucma_resolve_route(struct ucma_file *file, if (IS_ERR(ctx)) return PTR_ERR(ctx); + mutex_lock(&ctx->mutex); ret = rdma_resolve_route(ctx->cm_id, cmd.timeout_ms); + mutex_unlock(&ctx->mutex); ucma_put_ctx(ctx); return ret; } @@ -848,6 +862,7 @@ static ssize_t ucma_query_route(struct ucma_file *file, if (IS_ERR(ctx)) return PTR_ERR(ctx); + mutex_lock(&ctx->mutex); memset(&resp, 0, sizeof resp); addr = (struct sockaddr *) &ctx->cm_id->route.addr.src_addr; memcpy(&resp.src_addr, addr, addr->sa_family == AF_INET ? @@ -871,6 +886,7 @@ static ssize_t ucma_query_route(struct ucma_file *file, ucma_copy_iw_route(&resp, &ctx->cm_id->route); out: + mutex_unlock(&ctx->mutex); if (copy_to_user(u64_to_user_ptr(cmd.response), &resp, sizeof(resp))) ret = -EFAULT; @@ -1022,6 +1038,7 @@ static ssize_t ucma_query(struct ucma_file *file, if (IS_ERR(ctx)) return PTR_ERR(ctx); + mutex_lock(&ctx->mutex); switch (cmd.option) { case RDMA_USER_CM_QUERY_ADDR: ret = ucma_query_addr(ctx, response, out_len); @@ -1036,6 +1053,7 @@ static ssize_t ucma_query(struct ucma_file *file, ret = -ENOSYS; break; } + mutex_unlock(&ctx->mutex); ucma_put_ctx(ctx); return ret; @@ -1076,7 +1094,9 @@ static ssize_t ucma_connect(struct ucma_file *file, const char __user *inbuf, return PTR_ERR(ctx); ucma_copy_conn_param(ctx->cm_id, &conn_param, &cmd.conn_param); + mutex_lock(&ctx->mutex); ret = rdma_connect(ctx->cm_id, &conn_param); + mutex_unlock(&ctx->mutex); ucma_put_ctx(ctx); return ret; } @@ -1097,7 +1117,9 @@ static ssize_t ucma_listen(struct ucma_file *file, const char __user *inbuf, ctx->backlog = cmd.backlog > 0 && cmd.backlog < max_backlog ? cmd.backlog : max_backlog; + mutex_lock(&ctx->mutex); ret = rdma_listen(ctx->cm_id, ctx->backlog); + mutex_unlock(&ctx->mutex); ucma_put_ctx(ctx); return ret; } @@ -1120,13 +1142,17 @@ static ssize_t ucma_accept(struct ucma_file *file, const char __user *inbuf, if (cmd.conn_param.valid) { ucma_copy_conn_param(ctx->cm_id, &conn_param, &cmd.conn_param); mutex_lock(&file->mut); + mutex_lock(&ctx->mutex); ret = __rdma_accept(ctx->cm_id, &conn_param, NULL); + mutex_unlock(&ctx->mutex); if (!ret) ctx->uid = cmd.uid; mutex_unlock(&file->mut); - } else + } else { + mutex_lock(&ctx->mutex); ret = __rdma_accept(ctx->cm_id, NULL, NULL); - + mutex_unlock(&ctx->mutex); + } ucma_put_ctx(ctx); return ret; } @@ -1145,7 +1171,9 @@ static ssize_t ucma_reject(struct ucma_file *file, const char __user *inbuf, if (IS_ERR(ctx)) return PTR_ERR(ctx); + mutex_lock(&ctx->mutex); ret = rdma_reject(ctx->cm_id, cmd.private_data, cmd.private_data_len); + mutex_unlock(&ctx->mutex); ucma_put_ctx(ctx); return ret; } @@ -1164,7 +1192,9 @@ static ssize_t ucma_disconnect(struct ucma_file *file, const char __user *inbuf, if (IS_ERR(ctx)) return PTR_ERR(ctx); + mutex_lock(&ctx->mutex); ret = rdma_disconnect(ctx->cm_id); + mutex_unlock(&ctx->mutex); ucma_put_ctx(ctx); return ret; } @@ -1195,7 +1225,9 @@ static ssize_t ucma_init_qp_attr(struct ucma_file *file, resp.qp_attr_mask = 0; memset(&qp_attr, 0, sizeof qp_attr); qp_attr.qp_state = cmd.qp_state; + mutex_lock(&ctx->mutex); ret = rdma_init_qp_attr(ctx->cm_id, &qp_attr, &resp.qp_attr_mask); + mutex_unlock(&ctx->mutex); if (ret) goto out; @@ -1274,9 +1306,13 @@ static int ucma_set_ib_path(struct ucma_context *ctx, struct sa_path_rec opa; sa_convert_path_ib_to_opa(&opa, &sa_path); + mutex_lock(&ctx->mutex); ret = rdma_set_ib_path(ctx->cm_id, &opa); + mutex_unlock(&ctx->mutex); } else { + mutex_lock(&ctx->mutex); ret = rdma_set_ib_path(ctx->cm_id, &sa_path); + mutex_unlock(&ctx->mutex); } if (ret) return ret; @@ -1309,7 +1345,9 @@ static int ucma_set_option_level(struct ucma_context *ctx, int level, switch (level) { case RDMA_OPTION_ID: + mutex_lock(&ctx->mutex); ret = ucma_set_option_id(ctx, optname, optval, optlen); + mutex_unlock(&ctx->mutex); break; case RDMA_OPTION_IB: ret = ucma_set_option_ib(ctx, optname, optval, optlen); @@ -1369,8 +1407,10 @@ static ssize_t ucma_notify(struct ucma_file *file, const char __user *inbuf, if (IS_ERR(ctx)) return PTR_ERR(ctx); + mutex_lock(&ctx->mutex); if (ctx->cm_id->device) ret = rdma_notify(ctx->cm_id, (enum ib_event_type)cmd.event); + mutex_unlock(&ctx->mutex); ucma_put_ctx(ctx); return ret; @@ -1413,8 +1453,10 @@ static ssize_t ucma_process_join(struct ucma_file *file, mc->join_state = join_state; mc->uid = cmd->uid; memcpy(&mc->addr, addr, cmd->addr_size); + mutex_lock(&ctx->mutex); ret = rdma_join_multicast(ctx->cm_id, (struct sockaddr *)&mc->addr, join_state, mc); + mutex_unlock(&ctx->mutex); if (ret) goto err2; @@ -1518,7 +1560,10 @@ static ssize_t ucma_leave_multicast(struct ucma_file *file, goto out; } + mutex_lock(&mc->ctx->mutex); rdma_leave_multicast(mc->ctx->cm_id, (struct sockaddr *) &mc->addr); + mutex_unlock(&mc->ctx->mutex); + mutex_lock(&mc->ctx->file->mut); ucma_cleanup_mc_events(mc); list_del(&mc->list); -- 2.25.1
1 3
0 0
【Meeting Notice】openEuler kernel 技术分享第七期 & 双周例会 Time: 2021-06-25 14:00-18:00
by Meeting Book 23 Jun '21

23 Jun '21
1 0
0 0
  • ← Newer
  • 1
  • ...
  • 1705
  • 1706
  • 1707
  • 1708
  • 1709
  • 1710
  • 1711
  • ...
  • 1803
  • Older →

HyperKitty Powered by HyperKitty