From: Marcelo Ricardo Leitner marcelo.leitner@gmail.com
stable inclusion from stable-v4.19.271 commit 26436553aabfd9b40e1daa537a099bf5bb13fb55 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6I7U3 CVE: CVE-2023-1074
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=...
--------------------------------
[ Upstream commit 458e279f861d3f61796894cd158b780765a1569f ]
Currently, if you bind the socket to something like: servaddr.sin6_family = AF_INET6; servaddr.sin6_port = htons(0); servaddr.sin6_scope_id = 0; inet_pton(AF_INET6, "::1", &servaddr.sin6_addr);
And then request a connect to: connaddr.sin6_family = AF_INET6; connaddr.sin6_port = htons(20000); connaddr.sin6_scope_id = if_nametoindex("lo"); inet_pton(AF_INET6, "fe88::1", &connaddr.sin6_addr);
What the stack does is: - bind the socket - create a new asoc - to handle the connect - copy the addresses that can be used for the given scope - try to connect
But the copy returns 0 addresses, and the effect is that it ends up trying to connect as if the socket wasn't bound, which is not the desired behavior. This unexpected behavior also allows KASLR leaks through SCTP diag interface.
The fix here then is, if when trying to copy the addresses that can be used for the scope used in connect() it returns 0 addresses, bail out. This is what TCP does with a similar reproducer.
Reported-by: Pietro Borrello borrello@diag.uniroma1.it Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Marcelo Ricardo Leitner marcelo.leitner@gmail.com Reviewed-by: Xin Long lucien.xin@gmail.com Link: https://lore.kernel.org/r/9fcd182f1099f86c6661f3717f63712ddd1c676c.167449673... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Dong Chenchen dongchenchen2@huawei.com Reviewed-by: Yue Haibing yuehaibing@huawei.com Reviewed-by: Wang Weiyang wangweiyang2@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- net/sctp/bind_addr.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c index f8a283245672..d723942e5e65 100644 --- a/net/sctp/bind_addr.c +++ b/net/sctp/bind_addr.c @@ -88,6 +88,12 @@ int sctp_bind_addr_copy(struct net *net, struct sctp_bind_addr *dest, } }
+ /* If somehow no addresses were found that can be used with this + * scope, it's an error. + */ + if (list_empty(&dest->address_list)) + error = -ENETUNREACH; + out: if (error) sctp_bind_addr_clean(dest);
From: Ye Bin yebin10@huawei.com
maillist inclusion category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6K53I CVE: NA
Reference: https://patchwork.ozlabs.org/project/linux-ext4/patch/20230116020015.1506120...
--------------------------------
Syzbot found the following issue:
EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, O_DIRECT and fast_commit support! EXT4-fs (loop0): orphan cleanup on readonly fs ------------[ cut here ]------------ WARNING: CPU: 1 PID: 5067 at fs/ext4/mballoc.c:1869 mb_find_extent+0x8a1/0xe30 Modules linked in: CPU: 1 PID: 5067 Comm: syz-executor307 Not tainted 6.2.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 RIP: 0010:mb_find_extent+0x8a1/0xe30 fs/ext4/mballoc.c:1869 RSP: 0018:ffffc90003c9e098 EFLAGS: 00010293 RAX: ffffffff82405731 RBX: 0000000000000041 RCX: ffff8880783457c0 RDX: 0000000000000000 RSI: 0000000000000041 RDI: 0000000000000040 RBP: 0000000000000040 R08: ffffffff82405723 R09: ffffed10053c9402 R10: ffffed10053c9402 R11: 1ffff110053c9401 R12: 0000000000000000 R13: ffffc90003c9e538 R14: dffffc0000000000 R15: ffffc90003c9e2cc FS: 0000555556665300(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000056312f6796f8 CR3: 0000000022437000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ext4_mb_complex_scan_group+0x353/0x1100 fs/ext4/mballoc.c:2307 ext4_mb_regular_allocator+0x1533/0x3860 fs/ext4/mballoc.c:2735 ext4_mb_new_blocks+0xddf/0x3db0 fs/ext4/mballoc.c:5605 ext4_ext_map_blocks+0x1868/0x6880 fs/ext4/extents.c:4286 ext4_map_blocks+0xa49/0x1cc0 fs/ext4/inode.c:651 ext4_getblk+0x1b9/0x770 fs/ext4/inode.c:864 ext4_bread+0x2a/0x170 fs/ext4/inode.c:920 ext4_quota_write+0x225/0x570 fs/ext4/super.c:7105 write_blk fs/quota/quota_tree.c:64 [inline] get_free_dqblk+0x34a/0x6d0 fs/quota/quota_tree.c:130 do_insert_tree+0x26b/0x1aa0 fs/quota/quota_tree.c:340 do_insert_tree+0x722/0x1aa0 fs/quota/quota_tree.c:375 do_insert_tree+0x722/0x1aa0 fs/quota/quota_tree.c:375 do_insert_tree+0x722/0x1aa0 fs/quota/quota_tree.c:375 dq_insert_tree fs/quota/quota_tree.c:401 [inline] qtree_write_dquot+0x3b6/0x530 fs/quota/quota_tree.c:420 v2_write_dquot+0x11b/0x190 fs/quota/quota_v2.c:358 dquot_acquire+0x348/0x670 fs/quota/dquot.c:444 ext4_acquire_dquot+0x2dc/0x400 fs/ext4/super.c:6740 dqget+0x999/0xdc0 fs/quota/dquot.c:914 __dquot_initialize+0x3d0/0xcf0 fs/quota/dquot.c:1492 ext4_process_orphan+0x57/0x2d0 fs/ext4/orphan.c:329 ext4_orphan_cleanup+0xb60/0x1340 fs/ext4/orphan.c:474 __ext4_fill_super fs/ext4/super.c:5516 [inline] ext4_fill_super+0x81cd/0x8700 fs/ext4/super.c:5644 get_tree_bdev+0x400/0x620 fs/super.c:1282 vfs_get_tree+0x88/0x270 fs/super.c:1489 do_new_mount+0x289/0xad0 fs/namespace.c:3145 do_mount fs/namespace.c:3488 [inline] __do_sys_mount fs/namespace.c:3697 [inline] __se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3674 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Add some debug information: mb_find_extent: mb_find_extent block=41, order=0 needed=64 next=0 ex=0/41/1@3735929054 64 64 7 block_bitmap: ff 3f 0c 00 fc 01 00 00 d2 3d 00 00 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Acctually, blocks per group is 64, but block bitmap indicate at least has 128 blocks. Now, ext4_validate_block_bitmap() didn't check invalid block's bitmap if set. To resolve above issue, add check like fsck "Padding at end of block bitmap is not set".
Reported-by: syzbot+68223fe9f6c95ad43bed@syzkaller.appspotmail.com Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/ext4/balloc.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index d5a87464245e..77243483a596 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -303,6 +303,22 @@ struct ext4_group_desc * ext4_get_group_desc(struct super_block *sb, return desc; }
+static ext4_fsblk_t ext4_valid_block_bitmap_padding(struct super_block *sb, + ext4_group_t block_group, + struct buffer_head *bh) +{ + ext4_grpblk_t next_zero_bit; + unsigned long bitmap_size = sb->s_blocksize * 8; + unsigned int offset = num_clusters_in_group(sb, block_group); + + if (bitmap_size <= offset) + return 0; + + next_zero_bit = ext4_find_next_zero_bit(bh->b_data, bitmap_size, offset); + + return (next_zero_bit < bitmap_size ? next_zero_bit : 0); +} + /* * Return the block number which was discovered to be invalid, or 0 if * the block bitmap is valid. @@ -395,6 +411,15 @@ static int ext4_validate_block_bitmap(struct super_block *sb, EXT4_GROUP_INFO_BBITMAP_CORRUPT); return -EFSCORRUPTED; } + blk = ext4_valid_block_bitmap_padding(sb, block_group, bh); + if (unlikely(blk != 0)) { + ext4_unlock_group(sb, block_group); + ext4_error(sb, "bg %u: block %llu: padding at end of block bitmap is not set", + block_group, blk); + ext4_mark_group_bitmap_corrupted(sb, block_group, + EXT4_GROUP_INFO_BBITMAP_CORRUPT); + return -EFSCORRUPTED; + } set_buffer_verified(bh); verified: ext4_unlock_group(sb, block_group);
From: Li Lingfeng lilingfeng3@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I53Q6M CVE: NA
---------------------------
Commit 9a8887a9e69135c87d0748e589d7d31161d74d77 can't cover some special situations, so revert it and add a more complete one.
Signed-off-by: Li Lingfeng lilingfeng3@huawei.com Reviewed-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/block_dev.c | 20 ++------------------ include/linux/fs.h | 1 - 2 files changed, 2 insertions(+), 19 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c index 6ba91b97753f..4daa2998fbaf 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1666,8 +1666,6 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part) bdev->bd_openers++; if (for_part) bdev->bd_part_count++; - if (mode & FMODE_WRITE) - bdev->bd_write_openers++; mutex_unlock(&bdev->bd_mutex); disk_unblock_events(disk); /* only one opener holds refs to the module and disk */ @@ -1715,7 +1713,6 @@ int blkdev_get(struct block_device *bdev, fmode_t mode, void *holder) { struct block_device *whole = NULL; int res; - char name[BDEVNAME_SIZE];
WARN_ON_ONCE((mode & FMODE_EXCL) && !holder);
@@ -1735,19 +1732,6 @@ int blkdev_get(struct block_device *bdev, fmode_t mode, void *holder) if (whole) { struct gendisk *disk = whole->bd_disk;
- /* - * Open an write opened block device exclusively, the - * writing process may probability corrupt the device, - * such as a mounted file system, give a hint here. - */ - if (!res && (bdev->bd_write_openers > - ((mode & FMODE_WRITE) ? 1 : 0)) && !bdev->bd_holders) { - pr_info_ratelimited("VFS: Open an write opened " - "block device exclusively %s [%d %s].\n", - bdevname(bdev, name), current->pid, - current->comm); - } - /* finish claiming */ if (!res) { BUG_ON(!bd_may_claim(bdev, whole, holder)); @@ -1787,6 +1771,8 @@ int blkdev_get(struct block_device *bdev, fmode_t mode, void *holder) bdput(whole); } else { if (!res && (mode & FMODE_WRITE) && bdev->bd_holders) { + char name[BDEVNAME_SIZE]; + /* * Open an exclusive opened device for write may * probability corrupt the device, such as a @@ -1934,8 +1920,6 @@ static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part) sync_blockdev(bdev);
mutex_lock_nested(&bdev->bd_mutex, for_part); - if (mode & FMODE_WRITE) - bdev->bd_write_openers--; if (for_part) bdev->bd_part_count--;
diff --git a/include/linux/fs.h b/include/linux/fs.h index bcd2131ca06c..3892a5793c62 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -462,7 +462,6 @@ struct request_queue; struct block_device { dev_t bd_dev; /* not a kdev_t - it's a search key */ int bd_openers; - int bd_write_openers; struct inode * bd_inode; /* will die */ struct super_block * bd_super; struct mutex bd_mutex; /* open/close mutex */
From: Li Lingfeng lilingfeng3@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I53Q6M CVE: NA
---------------------------
Commit 14eae9d6bcf68eccf28478694ca764ab8fd2067b. can't cover some special situations, so revert it and add a more complete one.
Signed-off-by: Li Lingfeng lilingfeng3@huawei.com Reviewed-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/block_dev.c | 22 +++------------------- 1 file changed, 3 insertions(+), 19 deletions(-)
diff --git a/fs/block_dev.c b/fs/block_dev.c index 4daa2998fbaf..58be97f412fd 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1726,13 +1726,13 @@ int blkdev_get(struct block_device *bdev, fmode_t mode, void *holder)
res = __blkdev_get(bdev, mode, 0);
- mutex_lock(&bdev->bd_mutex); - spin_lock(&bdev_lock); - if (whole) { struct gendisk *disk = whole->bd_disk;
/* finish claiming */ + mutex_lock(&bdev->bd_mutex); + spin_lock(&bdev_lock); + if (!res) { BUG_ON(!bd_may_claim(bdev, whole, holder)); /* @@ -1769,22 +1769,6 @@ int blkdev_get(struct block_device *bdev, fmode_t mode, void *holder)
mutex_unlock(&bdev->bd_mutex); bdput(whole); - } else { - if (!res && (mode & FMODE_WRITE) && bdev->bd_holders) { - char name[BDEVNAME_SIZE]; - - /* - * Open an exclusive opened device for write may - * probability corrupt the device, such as a - * mounted file system, give a hint here. - */ - pr_info_ratelimited("VFS: Open an exclusive opened " - "block device for write %s [%d %s].\n", - bdevname(bdev, name), current->pid, - current->comm); - } - spin_unlock(&bdev_lock); - mutex_unlock(&bdev->bd_mutex); }
if (res)
From: Li Lingfeng lilingfeng3@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I53Q6M CVE: NA
---------------------------
Currently, we don't have an easy way to figure out a corrupted file system which has been writen data through the raw block device. It is risky to open a block device exclusively which has been opened for write by some other processes since this may lead to potential data corruption. This patch record the exclusive openers and give a hint if that happens.
Signed-off-by: Li Lingfeng lilingfeng3@huawei.com Reviewed-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/block_dev.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++ include/linux/fs.h | 2 ++ 2 files changed, 61 insertions(+)
diff --git a/fs/block_dev.c b/fs/block_dev.c index 58be97f412fd..6adb17cc7dbb 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -35,6 +35,7 @@ #include <linux/falloc.h> #include <linux/uaccess.h> #include <linux/suspend.h> +#include <linux/sched/task.h> #include "internal.h"
struct bdev_inode { @@ -1538,6 +1539,39 @@ static void bdev_disk_changed(struct block_device *bdev, bool invalidate) } }
+static void blkdev_dump_conflict_opener(struct block_device *bdev, char *msg) +{ + char name[BDEVNAME_SIZE]; + struct task_struct *p = NULL; + char comm_buf[TASK_COMM_LEN]; + pid_t p_pid; + + rcu_read_lock(); + p = rcu_dereference(current->real_parent); + get_task_comm(comm_buf, p); + p_pid = p->pid; + rcu_read_unlock(); + + pr_info_ratelimited("%s %s. current [%d %s]. parent [%d %s]\n", + msg, bdevname(bdev, name), + current->pid, current->comm, p_pid, comm_buf); +} + +static bool is_conflict_excl_open(struct block_device *bdev, + struct block_device *whole, fmode_t mode) +{ + if (bdev->bd_holders) + return false; + + if (bdev->bd_write_openers > ((mode & FMODE_WRITE) ? 1 : 0)) + return true; + + if (bdev == whole) + return !!bdev->bd_part_write_openers; + + return !!whole->bd_write_openers; +} + /* * bd_mutex locking: * @@ -1666,6 +1700,15 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part) bdev->bd_openers++; if (for_part) bdev->bd_part_count++; + + if (!for_part && (mode & FMODE_WRITE)) { + spin_lock(&bdev_lock); + bdev->bd_write_openers++; + if (bdev->bd_contains != bdev) + bdev->bd_contains->bd_part_write_openers++; + spin_unlock(&bdev_lock); + } + mutex_unlock(&bdev->bd_mutex); disk_unblock_events(disk); /* only one opener holds refs to the module and disk */ @@ -1732,6 +1775,14 @@ int blkdev_get(struct block_device *bdev, fmode_t mode, void *holder) /* finish claiming */ mutex_lock(&bdev->bd_mutex); spin_lock(&bdev_lock); + /* + * Open an write opened block device exclusively, the + * writing process may probability corrupt the device, + * such as a mounted file system, give a hint here. + */ + if (!res && is_conflict_excl_open(bdev, whole, mode)) + blkdev_dump_conflict_opener(bdev, + "VFS: Open an write opened block device exclusively");
if (!res) { BUG_ON(!bd_may_claim(bdev, whole, holder)); @@ -1907,6 +1958,14 @@ static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part) if (for_part) bdev->bd_part_count--;
+ if (!for_part && (mode & FMODE_WRITE)) { + spin_lock(&bdev_lock); + bdev->bd_write_openers--; + if (bdev->bd_contains != bdev) + bdev->bd_contains->bd_part_write_openers--; + spin_unlock(&bdev_lock); + } + if (!--bdev->bd_openers) { WARN_ON_ONCE(bdev->bd_holders); sync_blockdev(bdev); diff --git a/include/linux/fs.h b/include/linux/fs.h index 3892a5793c62..a8e36afa66b0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -465,6 +465,8 @@ struct block_device { struct inode * bd_inode; /* will die */ struct super_block * bd_super; struct mutex bd_mutex; /* open/close mutex */ + int bd_write_openers; + int bd_part_write_openers; void * bd_claiming; void * bd_holder; int bd_holders;
From: Li Lingfeng lilingfeng3@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I53Q6M CVE: NA
--------------------------------
Just like open a write opend block device exclusively, open an exclusive opened block device for write may also lead to potential data corruption. This patch add an info message when opening an exclusive opened block device for write to hint the potential data corruption.
Note that there are some legal cases such as file system or device mapper online resize, so this message is just a hint and isn't always mean that a risky written happens.
Signed-off-by: Li Lingfeng lilingfeng3@huawei.com Reviewed-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- fs/block_dev.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/fs/block_dev.c b/fs/block_dev.c index 6adb17cc7dbb..b4bb16d79d78 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -1820,6 +1820,18 @@ int blkdev_get(struct block_device *bdev, fmode_t mode, void *holder)
mutex_unlock(&bdev->bd_mutex); bdput(whole); + } else if (!res && (mode & FMODE_WRITE)) { + spin_lock(&bdev_lock); + /* + * Open an exclusive opened device for write may + * probability corrupt the device, such as a + * mounted file system, give a hint here. + */ + if (bdev->bd_holders || + ((bdev->bd_contains->bd_holder != NULL) && (bdev->bd_contains->bd_holder != bd_may_claim))) + blkdev_dump_conflict_opener(bdev, + "VFS: Open an exclusive opened block device for write"); + spin_unlock(&bdev_lock); }
if (res)
From: Li Lingfeng lilingfeng3@huawei.com
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I53Q6M CVE: NA
---------------------------
Signed-off-by: Li Lingfeng lilingfeng3@huawei.com Reviewed-by: Zhihao Cheng chengzhihao1@huawei.com Reviewed-by: Yang Erkun yangerkun@huawei.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com --- include/linux/fs.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h index a8e36afa66b0..6363c0a67af5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -462,11 +462,10 @@ struct request_queue; struct block_device { dev_t bd_dev; /* not a kdev_t - it's a search key */ int bd_openers; + int bd_write_openers; struct inode * bd_inode; /* will die */ struct super_block * bd_super; struct mutex bd_mutex; /* open/close mutex */ - int bd_write_openers; - int bd_part_write_openers; void * bd_claiming; void * bd_holder; int bd_holders; @@ -498,7 +497,11 @@ struct block_device { /* Mutex for freeze */ struct mutex bd_fsfreeze_mutex;
+#ifndef __GENKSYMS__ + int bd_part_write_openers; +#else KABI_RESERVE(1) +#endif KABI_RESERVE(2) KABI_RESERVE(3) KABI_RESERVE(4)