mailweb.openeuler.org
Manage this list

Keyboard Shortcuts

Thread View

  • j: Next unread message
  • k: Previous unread message
  • j a: Jump to all threads
  • j l: Jump to MailingList overview

Kernel

Threads by month
  • ----- 2025 -----
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2024 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2023 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2022 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2021 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2020 -----
  • December
  • November
  • October
  • September
  • August
  • July
  • June
  • May
  • April
  • March
  • February
  • January
  • ----- 2019 -----
  • December
kernel@openeuler.org

  • 27 participants
  • 18548 discussions
[PATCH OLK-5.10] dm: revert partial fix for redundant bio-based IO accounting
by Li Nan 25 Mar '24

25 Mar '24
From: Mike Snitzer <snitzer(a)redhat.com> mainline inclusion from mainline-v5.17-rc2 commit f524d9c95fab54783d0038f7a3e8c014d5b56857 category: bugfix bugzilla: 189706, https://gitee.com/openeuler/kernel/issues/I9BD67 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Reverts a1e1cb72d9649 ("dm: fix redundant IO accounting for bios that need splitting") because it was too narrow in scope (only addressed redundant 'sectors[]' accounting and not ios, nsecs[], etc). Cc: stable(a)vger.kernel.org Signed-off-by: Mike Snitzer <snitzer(a)redhat.com> Link: https://lore.kernel.org/r/20220128155841.39644-3-snitzer@redhat.com Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Conflict: drivers/md/dm.c Context changed, do not affect the logic of the code. Signed-off-by: Li Nan <linan122(a)huawei.com> --- drivers/md/dm.c | 15 --------------- 1 file changed, 15 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 654aae1dd777..1c79eaede4df 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1610,9 +1610,6 @@ static void init_clone_info(struct clone_info *ci, struct mapped_device *md, ci->sector = bio->bi_iter.bi_sector; } -#define __dm_part_stat_sub(part, field, subnd) \ - (part_stat_get(part, field) -= (subnd)) - /* * Entry point to split a bio into clones and submit them to the targets. */ @@ -1650,18 +1647,6 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md, GFP_NOIO, &md->queue->bio_split); ci.io->orig_bio = b; - /* - * Adjust IO stats for each split, otherwise upon queue - * reentry there will be redundant IO accounting. - * NOTE: this is a stop-gap fix, a proper fix involves - * significant refactoring of DM core's bio splitting - * (by eliminating DM's splitting and just using bio_split) - */ - part_stat_lock(); - __dm_part_stat_sub(&dm_disk(md)->part0, - sectors[op_stat_group(bio_op(bio))], ci.sector_count); - part_stat_unlock(); - bio_chain(b, bio); trace_block_split(md->queue, b, bio->bi_iter.bi_sector); ret = submit_bio_noacct(bio); -- 2.39.2
2 1
0 0
[PATCH openEuler-1.0-LTS] dm: revert partial fix for redundant bio-based IO accounting
by Li Nan 25 Mar '24

25 Mar '24
From: Mike Snitzer <snitzer(a)redhat.com> mainline inclusion from mainline-v5.17-rc2 commit f524d9c95fab54783d0038f7a3e8c014d5b56857 category: bugfix bugzilla: 189706, https://gitee.com/openeuler/kernel/issues/I9BD67 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Reverts a1e1cb72d9649 ("dm: fix redundant IO accounting for bios that need splitting") because it was too narrow in scope (only addressed redundant 'sectors[]' accounting and not ios, nsecs[], etc). Cc: stable(a)vger.kernel.org Signed-off-by: Mike Snitzer <snitzer(a)redhat.com> Link: https://lore.kernel.org/r/20220128155841.39644-3-snitzer@redhat.com Signed-off-by: Jens Axboe <axboe(a)kernel.dk> Conflict: drivers/md/dm.c Context changed, do not affect the logic of the code. Signed-off-by: Li Nan <linan122(a)huawei.com> --- drivers/md/dm.c | 14 -------------- 1 file changed, 14 deletions(-) diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 1d2ae304f096..f964a0818ddf 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -1638,8 +1638,6 @@ static void init_clone_info(struct clone_info *ci, struct mapped_device *md, ci->sector = bio->bi_iter.bi_sector; } -#define __dm_part_stat_sub(part, field, subnd) \ - (part_stat_get(part, field) -= (subnd)) /* * Entry point to split a bio into clones and submit them to the targets. */ @@ -1689,18 +1687,6 @@ static blk_qc_t __split_and_process_bio(struct mapped_device *md, GFP_NOIO, &md->queue->bio_split); ci.io->orig_bio = b; - /* - * Adjust IO stats for each split, otherwise upon queue - * reentry there will be redundant IO accounting. - * NOTE: this is a stop-gap fix, a proper fix involves - * significant refactoring of DM core's bio splitting - * (by eliminating DM's splitting and just using bio_split) - */ - part_stat_lock(); - __dm_part_stat_sub(&dm_disk(md)->part0, - sectors[op_stat_group(bio_op(bio))], ci.sector_count); - part_stat_unlock(); - bio_chain(b, bio); ret = generic_make_request(bio); break; -- 2.39.2
2 1
0 0
[PATCH OLK-5.10] netfilter: nf_tables: disallow timeout for anonymous sets
by Zhengchao Shao 25 Mar '24

25 Mar '24
From: Pablo Neira Ayuso <pablo(a)netfilter.org> mainline inclusion from mainline-v6.4-rc2 commit e26d3009efda338f19016df4175f354a9bd0a4ab category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9AK6C CVE: CVE-2023-52620 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Never used from userspace, disallow these parameters. Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org> Conflicts: net/netfilter/nf_tables_api.c Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- net/netfilter/nf_tables_api.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index d24e9b7834e4..c13b9fc00ca6 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -4425,6 +4425,9 @@ static int nf_tables_newset(struct net *net, struct sock *nlsk, if (!(flags & NFT_SET_TIMEOUT)) return -EINVAL; + if (flags & NFT_SET_ANONYMOUS) + return -EOPNOTSUPP; + err = nf_msecs_to_jiffies64(nla[NFTA_SET_TIMEOUT], &timeout); if (err) return err; @@ -4433,6 +4436,10 @@ static int nf_tables_newset(struct net *net, struct sock *nlsk, if (nla[NFTA_SET_GC_INTERVAL] != NULL) { if (!(flags & NFT_SET_TIMEOUT)) return -EINVAL; + + if (flags & NFT_SET_ANONYMOUS) + return -EOPNOTSUPP; + gc_int = ntohl(nla_get_be32(nla[NFTA_SET_GC_INTERVAL])); } -- 2.34.1
2 1
0 0
[PATCH openEuler-1.0-LTS] netfilter: nf_tables: disallow timeout for anonymous sets
by Zhengchao Shao 25 Mar '24

25 Mar '24
From: Pablo Neira Ayuso <pablo(a)netfilter.org> mainline inclusion from mainline-v6.4-rc2 commit e26d3009efda338f19016df4175f354a9bd0a4ab category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I9AK6C CVE: CVE-2023-52620 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… -------------------------------- Never used from userspace, disallow these parameters. Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org> Conflicts: net/netfilter/nf_tables_api.c Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com> --- net/netfilter/nf_tables_api.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 3ee5beb220bd..0f1a343d5b82 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3549,6 +3549,9 @@ static int nf_tables_newset(struct net *net, struct sock *nlsk, if (!(flags & NFT_SET_TIMEOUT)) return -EINVAL; + if (flags & NFT_SET_ANONYMOUS) + return -EOPNOTSUPP; + err = nf_msecs_to_jiffies64(nla[NFTA_SET_TIMEOUT], &timeout); if (err) return err; @@ -3557,6 +3560,10 @@ static int nf_tables_newset(struct net *net, struct sock *nlsk, if (nla[NFTA_SET_GC_INTERVAL] != NULL) { if (!(flags & NFT_SET_TIMEOUT)) return -EINVAL; + + if (flags & NFT_SET_ANONYMOUS) + return -EOPNOTSUPP; + gc_int = ntohl(nla_get_be32(nla[NFTA_SET_GC_INTERVAL])); } -- 2.34.1
2 1
0 0
[PATCH openEuler-1.0-LTS v6 0/2] CVE-2021-47110
by liwei 25 Mar '24

25 Mar '24
CVE-2021-47110 Kirill A. Shutemov (1): x86/kvm: Do not try to disable kvmclock if it was not enabled Vitaly Kuznetsov (1): x86/kvm: Disable kvmclock on all CPUs on shutdown arch/x86/include/asm/kvm_para.h | 4 ++-- arch/x86/kernel/kvm.c | 1 + arch/x86/kernel/kvmclock.c | 17 +++++++++-------- 3 files changed, 12 insertions(+), 10 deletions(-) -- 2.25.1
2 3
0 0
[PATCH openEuler-1.0-LTS v5 0/2] CVE-2021-47110
by liwei 25 Mar '24

25 Mar '24
CVE-2021-47110 Kirill A. Shutemov (1): x86/kvm: Do not try to disable kvmclock if it was not enabled Vitaly Kuznetsov (1): x86/kvm: Disable kvmclock on all CPUs on shutdown arch/x86/include/asm/kvm_para.h | 4 ++-- arch/x86/kernel/kvm.c | 1 + arch/x86/kernel/kvmclock.c | 17 +++++++++-------- 3 files changed, 12 insertions(+), 10 deletions(-) -- 2.25.1
2 3
0 0
[PATCH OLK-5.10] bus: mhi: host: Drop chan lock before queuing buffers
by Guan Jing 25 Mar '24

25 Mar '24
From: Qiang Yu <quic_qianyu(a)quicinc.com> stable inclusion from stable-v5.10.209 commit 20a6dea2d1c68d4e03c6bb50bc12e72e226b5c0e category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I97NHA CVE: CVE-2023-52493 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id… -------------------------------- Ensure read and write locks for the channel are not taken in succession by dropping the read lock from parse_xfer_event() such that a callback given to client can potentially queue buffers and acquire the write lock in that process. Any queueing of buffers should be done without channel read lock acquired as it can result in multiple locks and a soft lockup. Cc: <stable(a)vger.kernel.org> # 5.7 Fixes: 1d3173a3bae7 ("bus: mhi: core: Add support for processing events from client device") Signed-off-by: Qiang Yu <quic_qianyu(a)quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo(a)quicinc.com> Tested-by: Jeffrey Hugo <quic_jhugo(a)quicinc.com> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org> Link: https://lore.kernel.org/r/1702276972-41296-3-git-send-email-quic_qianyu@qui… [mani: added fixes tag and cc'ed stable] Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Guan Jing <guanjing6(a)huawei.com> --- drivers/bus/mhi/host/main.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c index 82d6e9e8bddf..b0a83a725069 100644 --- a/drivers/bus/mhi/host/main.c +++ b/drivers/bus/mhi/host/main.c @@ -570,6 +570,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, mhi_del_ring_element(mhi_cntrl, tre_ring); local_rp = tre_ring->rp; + read_unlock_bh(&mhi_chan->lock); + /* notify client */ mhi_chan->xfer_cb(mhi_chan->mhi_dev, &result); @@ -592,6 +594,8 @@ static int parse_xfer_event(struct mhi_controller *mhi_cntrl, kfree(buf_info->cb_buf); } } + + read_lock_bh(&mhi_chan->lock); } break; } /* CC_EOT */ -- 2.34.1
2 1
0 0
[PATCH OLK-6.6] ext4: Validate inode pa before using preallocation blocks
by Zhihao Cheng 25 Mar '24

25 Mar '24
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I97HJA CVE: NA -------------------------------- In ext4 continue & no-journal mode, physical blocks could be allocated more than once (caused by writing extent entries failed & reclaiming extent cache) in preallocation process, which could trigger a BUG_ON (pa->pa_free < len) in ext4_mb_use_inode_pa(). kernel BUG at fs/ext4/mballoc.c:4681! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 3 PID: 97 Comm: kworker/u8:3 Not tainted 6.8.0-rc7 RIP: 0010:ext4_mb_use_inode_pa+0x1b6/0x1e0 Call Trace: ext4_mb_use_preallocated.constprop.0+0x19e/0x540 ext4_mb_new_blocks+0x220/0x1f30 ext4_ext_map_blocks+0xf3c/0x2900 ext4_map_blocks+0x264/0xa40 ext4_do_writepages+0xb15/0x1400 do_writepages+0x8c/0x260 writeback_sb_inodes+0x224/0x720 wb_writeback+0xd8/0x580 wb_workfn+0x148/0x820 Details are shown as following: 0. Given a file with i_size=4096 with one mapped block 1. Write block no 1, blocks 1~3 are preallocated. ext4_ext_map_blocks ext4_mb_normalize_request size = 16 * 1024 size = end - start // Allocate 3 blocks (bs = 4096) ext4_mb_regular_allocator ext4_mb_regular_allocator ext4_mb_regular_allocator ext4_mb_use_inode_pa pa->pa_free -= len // 3 - 1 = 2 2. Extent buffer head is written failed, es cache and buffer head are reclaimed. 3. Write blocks 1~3 ext4_ext_map_blocks newex.ee_len = 3 ext4_ext_check_overlap // Find nothing, there should have been block 1 allocated = map->m_len // 3 ext4_mb_new_blocks ext4_mb_use_preallocated ext4_mb_use_inode_pa BUG_ON(pa->pa_free < len) // 2 < 3! Fix it by adding validation checking for inode pa. If invalid pa is detected, stop using inode preallocation, drop invalid pa to avoid it being used again, mark group block bitmap as corrupted to avoid allocating from the erroneous group. Fetch a reproducer in Link. Cc: stable(a)vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=218576 Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com> --- fs/ext4/mballoc.c | 128 +++++++++++++++++++++++++++++++++++----------- 1 file changed, 98 insertions(+), 30 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index ea5ac2636632..82aef3072162 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -422,6 +422,9 @@ static void ext4_mb_new_preallocation(struct ext4_allocation_context *ac); static bool ext4_mb_good_group(struct ext4_allocation_context *ac, ext4_group_t group, enum criteria cr); +static void ext4_mb_put_pa(struct ext4_allocation_context *ac, + struct super_block *sb, struct ext4_prealloc_space *pa); + static int ext4_try_to_trim_range(struct super_block *sb, struct ext4_buddy *e4b, ext4_grpblk_t start, ext4_grpblk_t max, ext4_grpblk_t minblocks); @@ -4783,6 +4786,79 @@ ext4_mb_pa_goal_check(struct ext4_allocation_context *ac, return true; } +/* + * check if found pa is valid + */ +static bool ext4_mb_pa_is_valid(struct ext4_allocation_context *ac, + struct ext4_prealloc_space *pa) +{ + struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); + ext4_fsblk_t start; + ext4_fsblk_t end; + int len; + + if (unlikely(pa->pa_free == 0)) { + /* + * We found a valid overlapping pa but couldn't use it because + * it had no free blocks. This should ideally never happen + * because: + * + * 1. When a new inode pa is added to rbtree it must have + * pa_free > 0 since otherwise we won't actually need + * preallocation. + * + * 2. An inode pa that is in the rbtree can only have it's + * pa_free become zero when another thread calls: + * ext4_mb_new_blocks + * ext4_mb_use_preallocated + * ext4_mb_use_inode_pa + * + * 3. Further, after the above calls make pa_free == 0, we will + * immediately remove it from the rbtree in: + * ext4_mb_new_blocks + * ext4_mb_release_context + * ext4_mb_put_pa + * + * 4. Since the pa_free becoming 0 and pa_free getting removed + * from tree both happen in ext4_mb_new_blocks, which is always + * called with i_data_sem held for data allocations, we can be + * sure that another process will never see a pa in rbtree with + * pa_free == 0. + */ + ext4_msg(ac->ac_sb, KERN_ERR, "invalid pa, free is 0"); + return false; + } + + start = pa->pa_pstart + (ac->ac_o_ex.fe_logical - pa->pa_lstart); + end = min(pa->pa_pstart + EXT4_C2B(sbi, pa->pa_len), + start + EXT4_C2B(sbi, ac->ac_o_ex.fe_len)); + len = EXT4_NUM_B2C(sbi, end - start); + + if (unlikely(start < pa->pa_pstart)) { + ext4_msg(ac->ac_sb, KERN_ERR, + "invalid pa, start(%llu) < pa_pstart(%llu)", + start, pa->pa_pstart); + return false; + } + if (unlikely(end > pa->pa_pstart + EXT4_C2B(sbi, pa->pa_len))) { + ext4_msg(ac->ac_sb, KERN_ERR, + "invalid pa, end(%llu) > pa_pstart(%llu) + pa_len(%d)", + end, pa->pa_pstart, EXT4_C2B(sbi, pa->pa_len)); + return false; + } + if (unlikely(pa->pa_free < len)) { + ext4_msg(ac->ac_sb, KERN_ERR, + "invalid pa, pa_free(%d) < len(%d)", pa->pa_free, len); + return false; + } + if (unlikely(len <= 0)) { + ext4_msg(ac->ac_sb, KERN_ERR, "invalid pa, len(%d) <= 0", len); + return false; + } + + return true; +} + /* * search goal blocks in preallocated space */ @@ -4906,45 +4982,37 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) goto try_group_pa; } - if (tmp_pa->pa_free && likely(ext4_mb_pa_goal_check(ac, tmp_pa))) { + if (unlikely(!ext4_mb_pa_is_valid(ac, tmp_pa))) { + ext4_group_t group; + + tmp_pa->pa_free = 0; + atomic_inc(&tmp_pa->pa_count); + spin_unlock(&tmp_pa->pa_lock); + read_unlock(&ei->i_prealloc_lock); + + ext4_mb_put_pa(ac, ac->ac_sb, tmp_pa); + group = ext4_get_group_number(ac->ac_sb, tmp_pa->pa_pstart); + ext4_lock_group(ac->ac_sb, group); + ext4_mark_group_bitmap_corrupted(ac->ac_sb, group, + EXT4_GROUP_INFO_BBITMAP_CORRUPT); + ext4_unlock_group(ac->ac_sb, group); + ext4_error(ac->ac_sb, "drop pa and mark group %u block bitmap corrupted", + group); + WARN_ON_ONCE(1); + goto try_group_pa_unlocked; + } + + if (likely(ext4_mb_pa_goal_check(ac, tmp_pa))) { atomic_inc(&tmp_pa->pa_count); ext4_mb_use_inode_pa(ac, tmp_pa); spin_unlock(&tmp_pa->pa_lock); read_unlock(&ei->i_prealloc_lock); return true; - } else { - /* - * We found a valid overlapping pa but couldn't use it because - * it had no free blocks. This should ideally never happen - * because: - * - * 1. When a new inode pa is added to rbtree it must have - * pa_free > 0 since otherwise we won't actually need - * preallocation. - * - * 2. An inode pa that is in the rbtree can only have it's - * pa_free become zero when another thread calls: - * ext4_mb_new_blocks - * ext4_mb_use_preallocated - * ext4_mb_use_inode_pa - * - * 3. Further, after the above calls make pa_free == 0, we will - * immediately remove it from the rbtree in: - * ext4_mb_new_blocks - * ext4_mb_release_context - * ext4_mb_put_pa - * - * 4. Since the pa_free becoming 0 and pa_free getting removed - * from tree both happen in ext4_mb_new_blocks, which is always - * called with i_data_sem held for data allocations, we can be - * sure that another process will never see a pa in rbtree with - * pa_free == 0. - */ - WARN_ON_ONCE(tmp_pa->pa_free == 0); } spin_unlock(&tmp_pa->pa_lock); try_group_pa: read_unlock(&ei->i_prealloc_lock); +try_group_pa_unlocked: /* can we use group allocation? */ if (!(ac->ac_flags & EXT4_MB_HINT_GROUP_ALLOC)) -- 2.31.1
2 1
0 0
[PATCH openEuler-1.0-LTS] ext4: Validate inode pa before using preallocation blocks
by Zhihao Cheng 25 Mar '24

25 Mar '24
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I97HJA CVE: NA -------------------------------- In ext4 continue & no-journal mode, physical blocks could be allocated more than once (caused by writing extent entries failed & reclaiming extent cache) in preallocation process, which could trigger a BUG_ON (pa->pa_free < len) in ext4_mb_use_inode_pa(). kernel BUG at fs/ext4/mballoc.c:4681! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 3 PID: 97 Comm: kworker/u8:3 Not tainted 6.8.0-rc7 RIP: 0010:ext4_mb_use_inode_pa+0x1b6/0x1e0 Call Trace: ext4_mb_use_preallocated.constprop.0+0x19e/0x540 ext4_mb_new_blocks+0x220/0x1f30 ext4_ext_map_blocks+0xf3c/0x2900 ext4_map_blocks+0x264/0xa40 ext4_do_writepages+0xb15/0x1400 do_writepages+0x8c/0x260 writeback_sb_inodes+0x224/0x720 wb_writeback+0xd8/0x580 wb_workfn+0x148/0x820 Details are shown as following: 0. Given a file with i_size=4096 with one mapped block 1. Write block no 1, blocks 1~3 are preallocated. ext4_ext_map_blocks ext4_mb_normalize_request size = 16 * 1024 size = end - start // Allocate 3 blocks (bs = 4096) ext4_mb_regular_allocator ext4_mb_regular_allocator ext4_mb_regular_allocator ext4_mb_use_inode_pa pa->pa_free -= len // 3 - 1 = 2 2. Extent buffer head is written failed, es cache and buffer head are reclaimed. 3. Write blocks 1~3 ext4_ext_map_blocks newex.ee_len = 3 ext4_ext_check_overlap // Find nothing, there should have been block 1 allocated = map->m_len // 3 ext4_mb_new_blocks ext4_mb_use_preallocated ext4_mb_use_inode_pa BUG_ON(pa->pa_free < len) // 2 < 3! Fix it by adding validation checking for inode pa. If invalid pa is detected, stop using inode preallocation, drop invalid pa to avoid it being used again, mark group block bitmap as corrupted to avoid allocating from the erroneous group. Fetch a reproducer in Link. Cc: stable(a)vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=218576 Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com> --- fs/ext4/mballoc.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index a40990da0b62..c07289164c12 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -356,6 +356,8 @@ static void ext4_mb_generate_from_pa(struct super_block *sb, void *bitmap, ext4_group_t group); static void ext4_mb_generate_from_freelist(struct super_block *sb, void *bitmap, ext4_group_t group); +static void ext4_mb_put_pa(struct ext4_allocation_context *ac, + struct super_block *sb, struct ext4_prealloc_space *pa); static inline void *mb_correct_addr_and_bit(int *bit, void *addr) { @@ -3364,6 +3366,47 @@ static void ext4_discard_allocated_blocks(struct ext4_allocation_context *ac) pa->pa_free += ac->ac_b_ex.fe_len; } +/* + * check if found pa is valid + */ +static bool ext4_mb_pa_is_valid(struct ext4_allocation_context *ac, + struct ext4_prealloc_space *pa) +{ + struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); + ext4_fsblk_t start; + ext4_fsblk_t end; + int len; + + start = pa->pa_pstart + (ac->ac_o_ex.fe_logical - pa->pa_lstart); + end = min(pa->pa_pstart + EXT4_C2B(sbi, pa->pa_len), + start + EXT4_C2B(sbi, ac->ac_o_ex.fe_len)); + len = EXT4_NUM_B2C(sbi, end - start); + + if (unlikely(start < pa->pa_pstart)) { + ext4_msg(ac->ac_sb, KERN_ERR, + "invalid pa, start(%llu) < pa_pstart(%llu)", + start, pa->pa_pstart); + return false; + } + if (unlikely(end > pa->pa_pstart + EXT4_C2B(sbi, pa->pa_len))) { + ext4_msg(ac->ac_sb, KERN_ERR, + "invalid pa, end(%llu) > pa_pstart(%llu) + pa_len(%d)", + end, pa->pa_pstart, EXT4_C2B(sbi, pa->pa_len)); + return false; + } + if (unlikely(pa->pa_free < len)) { + ext4_msg(ac->ac_sb, KERN_ERR, + "invalid pa, pa_free(%d) < len(%d)", pa->pa_free, len); + return false; + } + if (unlikely(len <= 0)) { + ext4_msg(ac->ac_sb, KERN_ERR, "invalid pa, len(%d) <= 0", len); + return false; + } + + return true; +} + /* * use blocks preallocated to inode */ @@ -3483,6 +3526,23 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) /* found preallocated blocks, use them */ spin_lock(&pa->pa_lock); + if (unlikely(!ext4_mb_pa_is_valid(ac, pa))) { + ext4_group_t group; + + pa->pa_free = 0; + atomic_inc(&pa->pa_count); + spin_unlock(&pa->pa_lock); + rcu_read_unlock(); + ext4_mb_put_pa(ac, ac->ac_sb, pa); + group = ext4_get_group_number(ac->ac_sb, pa->pa_pstart); + ext4_lock_group(ac->ac_sb, group); + ext4_mark_group_bitmap_corrupted(ac->ac_sb, group, + EXT4_GROUP_INFO_BBITMAP_CORRUPT); + ext4_unlock_group(ac->ac_sb, group); + ext4_error(ac->ac_sb, "drop pa and mark group %u block bitmap corrupted", + group); + goto try_group_pa; + } if (pa->pa_deleted == 0 && pa->pa_free) { atomic_inc(&pa->pa_count); ext4_mb_use_inode_pa(ac, pa); @@ -3495,6 +3555,7 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) } rcu_read_unlock(); +try_group_pa: /* can we use group allocation? */ if (!(ac->ac_flags & EXT4_MB_HINT_GROUP_ALLOC)) return 0; -- 2.31.1
2 1
0 0
[PATCH OLK-5.10] ext4: Validate inode pa before using preallocation blocks
by Zhihao Cheng 25 Mar '24

25 Mar '24
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I97HJA CVE: NA -------------------------------- In ext4 continue & no-journal mode, physical blocks could be allocated more than once (caused by writing extent entries failed & reclaiming extent cache) in preallocation process, which could trigger a BUG_ON (pa->pa_free < len) in ext4_mb_use_inode_pa(). kernel BUG at fs/ext4/mballoc.c:4681! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 3 PID: 97 Comm: kworker/u8:3 Not tainted 6.8.0-rc7 RIP: 0010:ext4_mb_use_inode_pa+0x1b6/0x1e0 Call Trace: ext4_mb_use_preallocated.constprop.0+0x19e/0x540 ext4_mb_new_blocks+0x220/0x1f30 ext4_ext_map_blocks+0xf3c/0x2900 ext4_map_blocks+0x264/0xa40 ext4_do_writepages+0xb15/0x1400 do_writepages+0x8c/0x260 writeback_sb_inodes+0x224/0x720 wb_writeback+0xd8/0x580 wb_workfn+0x148/0x820 Details are shown as following: 0. Given a file with i_size=4096 with one mapped block 1. Write block no 1, blocks 1~3 are preallocated. ext4_ext_map_blocks ext4_mb_normalize_request size = 16 * 1024 size = end - start // Allocate 3 blocks (bs = 4096) ext4_mb_regular_allocator ext4_mb_regular_allocator ext4_mb_regular_allocator ext4_mb_use_inode_pa pa->pa_free -= len // 3 - 1 = 2 2. Extent buffer head is written failed, es cache and buffer head are reclaimed. 3. Write blocks 1~3 ext4_ext_map_blocks newex.ee_len = 3 ext4_ext_check_overlap // Find nothing, there should have been block 1 allocated = map->m_len // 3 ext4_mb_new_blocks ext4_mb_use_preallocated ext4_mb_use_inode_pa BUG_ON(pa->pa_free < len) // 2 < 3! Fix it by adding validation checking for inode pa. If invalid pa is detected, stop using inode preallocation, drop invalid pa to avoid it being used again, mark group block bitmap as corrupted to avoid allocating from the erroneous group. Fetch a reproducer in Link. Cc: stable(a)vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=218576 Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com> --- fs/ext4/mballoc.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index c4f018dc0b59..2d33c0123b72 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -352,6 +352,9 @@ static void ext4_mb_generate_from_freelist(struct super_block *sb, void *bitmap, ext4_group_t group); static void ext4_mb_new_preallocation(struct ext4_allocation_context *ac); +static void ext4_mb_put_pa(struct ext4_allocation_context *ac, + struct super_block *sb, struct ext4_prealloc_space *pa); + /* * The algorithm using this percpu seq counter goes below: * 1. We sample the percpu discard_pa_seq counter before trying for block @@ -3812,6 +3815,47 @@ static void ext4_discard_allocated_blocks(struct ext4_allocation_context *ac) pa->pa_free += ac->ac_b_ex.fe_len; } +/* + * check if found pa is valid + */ +static bool ext4_mb_pa_is_valid(struct ext4_allocation_context *ac, + struct ext4_prealloc_space *pa) +{ + struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); + ext4_fsblk_t start; + ext4_fsblk_t end; + int len; + + start = pa->pa_pstart + (ac->ac_o_ex.fe_logical - pa->pa_lstart); + end = min(pa->pa_pstart + EXT4_C2B(sbi, pa->pa_len), + start + EXT4_C2B(sbi, ac->ac_o_ex.fe_len)); + len = EXT4_NUM_B2C(sbi, end - start); + + if (unlikely(start < pa->pa_pstart)) { + ext4_msg(ac->ac_sb, KERN_ERR, + "invalid pa, start(%llu) < pa_pstart(%llu)", + start, pa->pa_pstart); + return false; + } + if (unlikely(end > pa->pa_pstart + EXT4_C2B(sbi, pa->pa_len))) { + ext4_msg(ac->ac_sb, KERN_ERR, + "invalid pa, end(%llu) > pa_pstart(%llu) + pa_len(%d)", + end, pa->pa_pstart, EXT4_C2B(sbi, pa->pa_len)); + return false; + } + if (unlikely(pa->pa_free < len)) { + ext4_msg(ac->ac_sb, KERN_ERR, + "invalid pa, pa_free(%d) < len(%d)", pa->pa_free, len); + return false; + } + if (unlikely(len <= 0)) { + ext4_msg(ac->ac_sb, KERN_ERR, "invalid pa, len(%d) <= 0", len); + return false; + } + + return true; +} + /* * use blocks preallocated to inode */ @@ -3932,6 +3976,23 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) /* found preallocated blocks, use them */ spin_lock(&pa->pa_lock); + if (unlikely(!ext4_mb_pa_is_valid(ac, pa))) { + ext4_group_t group; + + pa->pa_free = 0; + atomic_inc(&pa->pa_count); + spin_unlock(&pa->pa_lock); + rcu_read_unlock(); + ext4_mb_put_pa(ac, ac->ac_sb, pa); + group = ext4_get_group_number(ac->ac_sb, pa->pa_pstart); + ext4_lock_group(ac->ac_sb, group); + ext4_mark_group_bitmap_corrupted(ac->ac_sb, group, + EXT4_GROUP_INFO_BBITMAP_CORRUPT); + ext4_unlock_group(ac->ac_sb, group); + ext4_error(ac->ac_sb, "drop pa and mark group %u block bitmap corrupted", + group); + goto try_group_pa; + } if (pa->pa_deleted == 0 && pa->pa_free) { atomic_inc(&pa->pa_count); ext4_mb_use_inode_pa(ac, pa); @@ -3944,6 +4005,7 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac) } rcu_read_unlock(); +try_group_pa: /* can we use group allocation? */ if (!(ac->ac_flags & EXT4_MB_HINT_GROUP_ALLOC)) return false; -- 2.31.1
2 1
0 0
  • ← Newer
  • 1
  • ...
  • 1174
  • 1175
  • 1176
  • 1177
  • 1178
  • 1179
  • 1180
  • ...
  • 1855
  • Older →

HyperKitty Powered by HyperKitty